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Accomplishments 


In this grant we addressed issues related to the two major components of our proposed 
research for Space-Time Adaptive Processing (STAP) applications, namely (i) the development 
of advanced algorithms for detection and parameter estimation of weak targets in the presence 
of jamming and radar clutter, and (ii) the mapping of the algorithms into massively parallel 
architectures for their high-speed implementation. Our accomplishments during this grant may 
be summarized as follows: 

Advances in Detection and Estimation for STAP Applications 

with Joint Gaussian Statistics 
Investigator: Professor Irving S. Reed 

1. The complete theoretical development of the Cross-Spectral Method , an optimal technique 
for reduced-rank STAP. The performance measure for optimality is maximizing the output 
signal-to-interference plus noise ratio, which is equivalent to minimizing the mean-square 
error of the processor. A proof of optimality has been obtained and, while counter-intuitive, 
the resulting solution is shown to be different from that yielded by the principal component 
approximation of the array covariance matrix. The rank of the resulting weight vector 
is reduced by the cross-spectral method, in contrast to the principal component inverse 
(PCI)/Eigencanceler methods. Finally, a low complexity filtering-based implementation of 
the cross-spectral method has been investigated; 

2. The design of a simultaneous CFAR detection and maximum likelihood (ML) estimation 
STAP algorithm for airborne radar. Generalization of our previous results on element-space 
post-Doppler STAP to fully adaptive, beam-space post-Doppler processors. This study 
demonstrates that the finer resolution leads to a reduction in the detection capability. This 
implies that one may trade-off the accuracy of ML estimation with the performance of the 
CFAR detection criterion; 

3. The development of a fast constant false alarm (CFAR) detection algorithm for STAP. This 
approach is based on the eigen-decomposition of the space-time radar covariance matrix. 
The method utilized by this algorithm does not require a matrix inversion, thereby reducing 
the computational complexity requirements of the processor. 


Robust Detection and Estimation with Alpha-Stable 
Distributions for STAP Applications 
Investigator: Professor Chrysostomos L. Nikias 


] The characterization and development of methods for parameter estimation of symmetric 
alpha-stable ( SaS ) distributions by means of fractional lower- and negative-order moments. 
The modeling of the amplitude statistics of radar clutter by means of SaS distributions 
and the estimation of the parameters of the stable distributions from real clutter of the 

Mountain Top Database; 
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2. The development of an adaptive matched-filter detector for the case of a signal embedded in 
heavy-tailed noise modeled as a sub-Gaussian, alpha-stable process. The implementation of 
a generalized likelihood ratio detector which employs robust estimates of the unknown noise 
underlying matrix and the unknown signal strength. The comparison of the performance 
of the new detector to the performance of the optimum detector under completely known 
signal and Gaussian noise characteristics; 

3. The theoretical analysis on Cramer-Rao bounds for target angle and Doppler estimation for 
airborne radar operating in interference modeled as a Cauchy process. The development of 
optimal, maximum likelihood-based approaches to the estimation problem and the intro¬ 
duction of the Cauchy Beamformer which exhibits robust performance in a wide range of 
impulsive noise environments. The development of new joint spatial- and Doppler-frequency 
high-resolution estimation techniques based on the eigendecomposition of the covariation 
matrix of the space-time radar measurements. 


Scalable Portable Parallel Algorithms for STAP Applications 

Investigator: Professor Viktor K. Prasanna 

1. The development of efficient algorithms for parallel Higher Order Post-Doppler Process¬ 
ing. Our algorithms, which exploit remapping data between various computation steps and 
scheduling the computations to overlap communication, lead to scalable parallel algorithms. 
The previous approach employs row wrap mapping scheme to avoid data remapping. How¬ 
ever, the communication time increases as the number of processors increases. Our scalabil¬ 
ity analysis and experimental results show that the frequent short message communications 
of the earlier fine grain algorithms can be eliminated by data remapping; 

2. The development of parallel weight computation steps of Element Space Pre-Doppler and 
Element Space Post-Doppler STAP, which are the major computational bottlenecks in the 
entire processing. The tasks are partitioned evenly. The parallelism is achieved by dis¬ 
tributing whole tasks among the processing elements. The reason for employing task level 
parallelism is that the problem sizes of the tasks in Element Space STAP are too small to 
parallelize efficiently. Despite advances in network technology, state-of-the-art HPC plat¬ 
forms still consume longer time in communicating than in computing a unit data. Hence, 
coarse grain parallelism is more suitable for these platforms. Thus, eliminating sparse short 
communications by providing data required in each task based on owner compute rule, can 
achieve much greater speed-up; 

3. The development of efficient data distribution algorithms which exploit intermediate data 
remapping schemes and communication contexts. The proposed scheme is a two phase ap¬ 
proach which exploits combine-and-forward techniques. By using intermediate data map¬ 
ping, the introduced algorithms effectively reduce startup overhead and node contention. 
Our algorithms also utilize pipelined communication scheduling. 
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Advances in Detection and 
Estimation for STAP Applications 
with Joint Gaussian Statistics 


Principal Investigator: Professor Irving S* Reed 

Collaborators: Jay Scott Goldstein and Yow-Ling Gau 


This part includes results on optimal techniques with joint Gaussian statistics for reduced-rank 
STAP applications, CFAR detection, and maximum likelihood estimation. First, in Section 1.1, 
the complete theoretical development of the optimal technique for reduced-rank STAP, termed 
the cross-spectral method, has been accomplished. The performance measure for optimality is 
maximizing the output signal-to-interference plus noise ratio, which is equivalent to minimizing 
the mean-square error of the processor. A proof of optimality has been obtained and, while 
counter-intuitive, the resulting solution is shown to be different from that yielded by the principal 
component approximation of the array covariance matrix. The rank of the resulting weight vector 
is reduced by the cross-spectral method, in contrast to the PCI/Eigencanceler methods. Finally, a 
low complexity filtering-based implementation of the cross-spectral method has been investigated, 
which provides excellent performance while alleviating the requirements for covariance matrix 
estimation and eigen-decomposition. 

In Section 1.2, we design a simultaneous CFAR detection and ML estimation STAP algorithm 
for airborne radar. Our previous results on element-space post-Doppler STAP have been gen¬ 
eralized and applied to the fully adaptive and beam-space post-Doppler processors. This study 
demonstrates that the finer resolution leads to a reduction in the detection capability. This im¬ 
plies that one may trade-off the accuracy of ML estimation with the performance of the CFAR 
detection criterion. 

Finally, in Section 1.3, we develop a fast CFAR detection algorithm for STAP applications. 
This approach is based on the eigen-decomposition of the space-time radar covariance matrix. 
The method utilized by this algorithm does not require a matrix inversion, thereby reducing the 
computational complexity requirements of the processor. 
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1.1 The Cross-Spectral Metric and Optimal Rank Reduction 

The problem of optimal detection is closely related to constrained Wiener filtering [1, 2, 3, 4, 5, 6]. 
In fact, the optimal detection criterion relies upon steering a beam in the direction and Doppler 
of interest, while nulling out the effects of interference and clutter in that beam. In this sense, 
the canonical form of the optimal detector takes the form of a partitioned processor termed the 
Generalized Sidelobe Canceller (GSC). The GSC was introduced by Applebaum and Chapman 
[7] for narrowband signals and extended to the space-time problem by Griffiths [8]. In this 
section, the theoretical development of optimal rank reduction for space-time adaptive processing 
is achieved. The cross-spectral metric [9, 10, 11, 12] is derived as the optimal technique to reduce 
the rank of the weight vector while maximizing the output signal-to-interference plus noise ratio 
(SINR). The methodology for achieving this optimization is to minimize the mean-square error 
of the GSC as a function of the weight vector rank. Minimizing the mean-square error and 
maximizing the output SINR for the GSC processor have been shown to be equivalent [6]. Indeed, 
it is easily shown that the weight vector which maximizes the SINR for a constrained processor 
and the weight vector that minimizes the mean-square error are identical. 

It is important to note that other reduced-rank STAP techniques which are based on the 
principal component technique, such as the principal component inverse (PCI) method [13, 14] 
(also recently termed an Eigencanceler [15]) only form a low rank estimate of the array covari¬ 
ance matrix. That is to say, while the covariance matrix is approximated by only the principal 
eigenvectors and their corresponding eigenvalues, the resulting weight vector is still full-rank. 
The approach described herein differs from all other approaches in three significant ways: (1): 
the principal component technique provides the best low-rank approximation to the full rank 
covariance matrix [16]. However, this is not the correct performance measure. The correct per¬ 
formance measure is to find the best low-rank approximation of the covariance matrix (i.e.: the 
best reduced-rank subspace) for maximizing the output SINR . While counter-intuitive, these 
two subspaces are not the same. (2) The actual number of adaptive weights required by the PCI 
technique is identical to that required in the full-rank case. The cross-spectral technique results 
in a weight vector whose size corresponds to the reduced dimension. This means that using the 
cross-spectral approach to reduce the rank of the problem from some large number N to some 
small .number M yields a weight vector of dimension M. Since the computational complexity 
and required hardware increases with the rank of the weight vector, this issue is paramount. (3) 
Finally, the development of the cross-spectral metric naturally lends itself to approximations us¬ 
ing any unitary transform to form a basis for the space spanned by the array covariance matrix. 
Thus, the eigenvectors need not be used and the eigen-decomposition of the array covariance 
matrix need not be performed. Further, since it is possible to realize a unitary operator via fast 
filtering algorithms, such as the DFT and DCT, one may evaluate the cross-spectral metric and 
perform rank reduction without ever forming the array covariance matrix estimate from the data. 
This in turn implies that the rank reduction may be performed via a criterion which is based 

only on simple filtering operations. 

1.1.1 Space-Time Adaptive Processing 

Radar returns are collected in coherent processing intervals (CPI) which may be represented as a 
3-D data, cube as shown in Figure 1.1. The data is then processed at one range of interest, which 
corresponds to a slice of the CPI data cube. This slice is a J X K space-time snapshot, denoted 
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Figure 1.1: 3-Dimensional CPI Data Cube 


X. The space-time snapshot can be expressed as a matrix 
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where the individual elements Xj ^ correspond to the data from the j -th pulse repetition interval 
(PRI) and the k -th sensor element [17, 18]. 

The 2-D space-time data structure consists of element space information and PRI space 
Doppler information. Each return X is composed of components due to the target, the interference 
sources or jammers, clutter, and white noise: 

X = Xt + Xi + X c + X w . (1.2) 

The noise in the space-time snapshot is the thermal noise present at the sensor elements, which 
is spatially and temporally white. Each of the other components of the radar return are now 
addressed. 

The target is assumed to be a point target, present in only one range bin. The J X K 
space-time snapshot corresponding to the target is then given by 

X* = a t b(w 4 )a(i? t ) H , (1.3) 

where a {d t ) is the K x 1 spatial steering vector, b(o> t ) is the J X 1 temporal steering vector, and 
a t is the amplitude of the target. 

The interference environment consists of several sources which are assumed to be temporally 
white (barrage jammers spread over all Doppler frequencies at a particular azimuth). The J X K 
space-time snapshot corresponding to the jammers is then given by 

Xi = (1.4) 
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where a* is the J x I matrix of jammer amplitude for the J PRIs of each of the I interference 
sources and A (#*) is the K X I matrix whose row are the spatial steering vectors for each 
interference source. 

The clutter returns when considering a target at a given range will come from all areas within 
a common range about the radar platform. Consider returns from a circular ring about the 
platform. If each spot on the clutter ring is considered to be a point target, its space-time 
snapshot will take the form 

Xc p = a Cp b(w Cp )a(i? Cp ) // , (1.5) 

where u? Crj and are the spatial and Doppler frequencies of the clutter return from the p-th 
patch, and a Cp is the power from the p-th clutter patch. The clutter is assumed to be stationary 
and it has a velocity relative to the platform housing the radar which is determined by the azimuth 
angle between the array and the clutter patch. For simulation purposes, the clutter returns are 
assumed to come from the entire clutter ring, which is divided into 360 evenly spaced sectors, 
and summed so that 

360 

x c = X! a c P HP^c P M'd Cp ) H , (i.6) 

4>c p —l 

where /? is the constant of linearity between the Doppler frequency and the spatial frequency and 

d 

flc p = t~ cos e c sin 4 > Cp , (1.7) 

with d being the element spacing, A 0 being the wavelength, and 9 C being the fixed elevation angle. 

Each of the components of the space-time snapshots are assumed to be uncorrelated. These 
component matrices are stacked column-wise to form the KJ X 1 vector x = x* + x t * + x c + x w . 
The resulting space-time covariance matrix is the KJ X KJ matrix R^: 

R^ = E[xx^] = R* + R i + R c + Ru/> (1-8) 

where R*, RR c , and R^ are the target signal, interference, clutter, and white noise covariance 
matrices, respectively. 

1 . 1.2 The GSC Form Processor 

The structure of the processor under consideration is the GSC partitioned form processor. The 
received snapshot is of dimension KJ, which is the product of the number of sensor elements and 
the number of Doppler frequency bins. The GSC is implemented by partitioning the received data 
with filters w c and W s . The conventional beamforming filter w c is a vector which enforces the 
look-direction constraint in angle and Doppler. This desired signal is blocked from the adaptive 
processor through the signal blocking matrix W 5 , which in general is of dimension N x KJ where 
N < KJ. For example, in the single linear constraint case, N — KJ — 1. The full row rank 
matrix W s is composed of rows a; such that a*w c = 0 for i = 1,2,.. .KJ. 

As previously described, the KJ X 1-dimensional received signal vector present on the antenna 
elements at time k is denoted by x(fc) and the associated KJ X KJ received input data covariance 
matrix is denoted by R^ = E[x(fc)x i/ (A;)]. The A-dimensional noise subspace data vector x 5 (&), 
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Figure 1.2: The full-rank GSC MVDR processor. 



the scalar beamformed output d(k ), and the scalar beamformed noise estimator y s {k ) are given 

by 


X s (fc) 

d{k) 

Vs{k) 


W ,x(i), 

W?x(*), 

w H x s (k), 


(1.9) 


where w is the iV-dimensional weight vector. Finally, the array output is 


y(k) = (vrf? - w H W s ) x(k). 


( 1 . 10 ) 


Evidently, the mean-square error of the processor is given by the mean-square value of y(k). 
The GSC array in Figure 1.2 converges to the discrete-time Wiener-Hopf solution given by 


w = 

where the observation data covariance matrix is expressed as 


( 1 . 1 !) 


R xs = E[x,(fc)x?(*)] = W.R.Wf, 


( 1 . 12 ) 


and the cross-correlation vector between the noise subspace data vector and the beamformer 
output is given by 

r xad = E [x s {k)d*{k)] = W s RxW c . (1.13) 


5 









Figure 1.3: The full-rank GSC MVDR processor in principal coordinates. 

The minimum mean-square error, denoted P , is found by substituting (1.11) into (1.10) and 
evaluating the mean-square value of y(k ): 

P = EflyMI 2 ] = aj - rg d R^r Xsd , (1.14) 

where a d = wfR ra w c is the variance of the conventional beamformer output. 

The observation data covariance matrix R^ may be expressed in terms of its eigenvectors 
and eigenvalues as 

= UAU h , (1.15) 

where U is a unitary N x N matrix composed of the eigenvectors {V;} •=! and A is the diagonal 
matrix of associated eigenvalues {A;}^. In terms of the principal coordinates of the problem, we 
define the process p(k) = JJ H x s (k). A normal component covariance matrix R p , cross-correlation 
vector r pd , and Wiener filter w w are defined now as follows: 

R P = E[p(k)p H (k)] = V H R Xs \J = A, 

r pd = E[p(k)d*(k)] = U H r Xsd , (1.16) 

w N = Rp 1 r pd = U ff w. 

The GSC in these normal coordinates, depicted in Figure 1.3, is equivalent to the GSC in Figure 
1.2 in terms of its steady-state characteristics. The array output of the GSC in normal coordinates 
is given by 

y = ( W f - U^W S ) x(k) = (wf - w H W s ) x(k) . (1.17) 

Note that the MMSE, 

p = °l - r ^ R p lr P^ = - r f S d R x s lr ^ i 1 - 18 ) 

is conserved by any unitary transformation, including that realized by the operator U. 
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Figure 1.4: The reduced-rank GSC MVDR processor. 

1.1.3 Partially Adaptive Processing 

The problem of reducing the degrees of freedom for an array processor involves selecting a subset 
or some combination of the elements to be adaptively weighted. For notational purposes, we let 
the space spanned by the columns of the fully adaptive array covariance matrix be denoted by 
C N , implying that the observation covariance matrices R Xs and R p are of dimension N X N and 
the vectors r Xs di r pdi w, aa d w n are A X 1-dimensional vectors. The partially adaptive GSC 
shown in Figure 1.4 utilizes an N X M transformation operator W, in place of U in Figure 1.3, 
to form the M-dimensional reduced-rank observation data vector 

z(k) = U H x s (k), (1-19) 

where M < N. The associated M X M reduced-rank covariance matrix is given by 

R z =U H R Xs U. (1.20) 

The data vector z (k) is then processed by the reduced-rank weight vector w M , which is of di¬ 
mension Mxl. It is the selection of the rank reducing operator U which serves as the present 

topic of interest. 

The most popular technique for subspace selection is based upon the principal components 
method [19, 20, 21], This method determines the singular value decomposition of the N X N- 
dimensional covariance matrix R ls and selects the M largest eigenvectors (those corresponding to 
the largest eigenvalues) to form the M-dimensional eigen-subspace & C C N in which the adaptive 
processor operates. However, this technique does not directly consider the MMSE performance 
measure, which is a function of not only the space spanned by the noise covariance matrix but 
also of the cross-correlation between the desired signal and the noise process. It is noted that 
Byerly [22] discovered that the eigenvectors corresponding to the largest eigenvalues were not 
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necessarily the best selection, but there was no derivation provided and the approach obtained 
herein provides a more general solution. 

Following Scharf [3, ch. 8], we examine the problem of reducing the rank of the Wiener filter. 
In Figure 1.4, the rank reducing N X M transformation matrix U is composed of some M columns 
from U which are to be selected. The operator U is constrained therefore to be a subset of M of 
the N possible eigenvectors of R^. This particular constraint allows a direct comparison with 
the principal component technique, which chooses the rank reducing transform to be composed of 
those M eigenvectors corresponding to the largest M eigenvalues. Thus, the particular problem 
at hand is to choose the subspace spanned by a set of M eigenvectors out of the N available such 
that the resulting M-dimensional Wiener filter yields the lowest MMSE out of all (j^) possible 
combinations of eigenvectors. 

Now denote the reduced-rank processor output by y r (k) and the reduced-rank noise estimator 
by y Sr {k). This is illustrated in Figure 1.4. A study of Figures 1.3 and 1.4 suggest that the 
reduced-rank processor output may be expressed as 

Vr(k) = [l . (1.21) 

Denote the weight error vector between the full-rank weight vector and its reduced-rank version 
by 

e = Uw JV -Ww M . (1.22) 

The mean-square value of the reduced-rank processor output y r ( k ) in this notation is computed 
to be 

E[\y r (k)\ 2 ] = P + e H R Xs e, (1.23) 

where P is the full-rank MMSE and P + e ff R Is e is the reduced-rank MMSE. It is now desired 
to choose the rank reducing operator li in such a manner that w M minimizes the additional 
mean-square error incurred by rank reduction, namely the scalar term e^R^e in (1.23), as 

follows: 

min e^R^e = min [(w»Rj /2 - wS U«K?) (wX /2 - <u«K'?) h ] ■ (1.24) 

Then evidently the best solution for (1.24) is to choose U such that w^W^R J s = is 

the best low rank approximation to the vector w^Rp 2 . 

The Wiener-Hopf relationship for the full-rank case, 

R p w w = r pd , (1.25) 


implies that 



( 1 . 26 ) 
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Thus, in order to make the vector w%U h r11 2 be the best low rank approximation to the vector 
w»R T, it is necessary to rank order the terms in (1.26) by their magnitude. Then the rank 
reducing operator U is selected by choosing those M eigenvectors which correspond with the 


largest M values of the sequence of non-negative terms 



(1.27) 


for i = 1,2,.. .77. With this selection, the columns of the reduced-rank covariance matrix R* 
span the M-dimensional cross-spectral subspace ft C C N to provide the lowest MMSE of any 
M-dimensional subspace which is spanned by M of the N columns of U. It is noted that this 
solution is similar to the SVD technique described in [3, ch. 8.4]. Also it is of interest physically 
that the term in (1.27) measures the cross-spectral energy projected along the i-th eigenvector. 

Hence the reduced-rank Wiener filter in the subspace ft is given by 


W M — {U U Yxsd — ^ ^Xsdi 


-UjH 


(1.28) 


where A M is the diagonal matrix composed of the M eigenvalues corresponding to the eigenvectors 
which form U. Clearly, the subspace ft spanned by the columns of eigenvectors corresponding 
with the M largest values of the cross-spectral metric is not the same as the subspace spanned 
by the eigenvectors corresponding with the M largest eigenvalues. This means that the Wiener 
filter in the cross-spectral subspace ft yields a lower MMSE than the Wiener filter in the subspace 

To demonstrate that the cross-spectral metric is optimal for each rank M < 77, consider the 
decomposition of the MMSE performed by the full-rank matrix of eigenvectors U. The full-rank 
MMSE is given by 



(1.29) 


Finally, a comparison of Equations (1.27) and (1.29) demonstrate that the selection of the sub¬ 
space which provides the largest cross-spectral contribution also results in the lowest MMSE as a 
function of the rank of the Wiener filter. Since minimizing the mean-square error and maximizing 
the output SINR are equivalent for the constrained array, the reduced-rank Wiener filter found 
via the cross-spectral metric maximizes the output SINR as a function of the rank of the filter. 


1.1.4 Example Using the Mountaintop Radar Parameters 

A simulation is now considered which utilizes the parameters of the Mountaintop radar to com¬ 
pare the performance of the reduced-rank processors to that of the full-rank, joint domain op¬ 
timal processor. In particular, the output SINR for the full-rank processor, the eigen-subspace 
reduced-rank processor and two cross-spectral subspace reduced-rank processors will be com¬ 
puted. Finally, the output SINR as a function of Wiener filter rank will be presented for these 
processors and contrasted to that resulting from the standard technique of simply choosing the 
largest principal components of the noise covariance matrix. 
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The second cross-spectral subspace processor, mentioned in the preceeding paragraph, utilizes 
the DCT in place of the eigenvectors of the array covariance matrix. This new frequency domain 
processor has an extremely low computational complexity. The trade-off is a minor loss in 
performance, to be demonstrated shortly. 

A. Description of the Mountaintop Radar and the Signal Environment 

The Mountaintop radar employs the Radar Surveillance Technology Experimental Radar (RSTER) 
and the Inverse Displaced Phase Center Array (IDPCA) co-located at the same site [23]. It is 
assumed that the radar is in the RSTER-90 configuration and receive-only mode. The transmit 
frequency is 450 MHz. This radar consists of K =14 elements and J=16 pulses in the PRI. The 
elevation angle 9 is fixed (pre-beamformed), and the azimuth angle <j> is the only free parameter. 

The processing is performed over one range gate of interest, where it is assumed that this 
range gate contains the target signal. Thus, the number of inputs to the adaptive processor is 
KJ = 224. The GSC processor beamforms the target signal in one branch and blocks the target 
signal from the other branch, as described earlier in this section. The noise subspace data vector 
is then of dimension N = KJ - 1 = 223. The corresponding full-rank GSC weight vector is of 
dimension 223 X 1. 

For the purpose of this analysis, three barrage jammers and clutter compose the interference 
environment, and one target signal is present. The radar platform is assumed to be at a height 
of 500 meters, and the platform velocity is 500 meters per second. The target, which is also at a 
height of 500 meters, has a range of 22 kilometers and an azimuth of -30°. The target velocity 
is 250 meters per second. This target has a signal-to-noise ratio (SNR) of 0 dB, represents a very 
small constant radar cross-section target. The three jammers each have an SNR of 40 dB and 
azimuth angles of -60°, 30° and 60°, respectively. The clutter-to-noise ratio is 45.6 dB. 

B. Analysis and Results 

We now examine the performance of the fully adaptive GSC, the partially adaptive eigen-subspace 
GSC, and the partially adaptive cross-spectral subspace GSC processors. To evaluate the per¬ 
formance of the subspace selection techniques, the dimension of the adaptive processor will be 
reduced from N = 223 weights to M = 50. The dimension of the noise subspace eigenstructure 
is 77, which is supposedly the lower bound for rank reduction [19, 20]. 

The power spectrum of the data, averaged over 500 snapshots, is presented in Figure 1.5, 
from which it is evident that the target signal is not discernible. Figure 1.6 presents the full rank 
Wiener filter power spectrum, from which it is seen that the optimal space-time weight vector of 
dimension 223 is capable of attenuating all jammers and clutter while passing the target signal. 
The full-rank Wiener filter SINR is 23.28 dB. Figure 1.7 depicts the rank 50 eigen-subspace 
Wiener filter. The clutter is attenuated, but most of the power from the three jammers is passed 
with relatively high gain, particularly the jammer at an azimuth angle of 30°. The output SINR 
for the eigen-subspace Wiener filter is 7.65 dB, reflecting a loss of 15.63 dB. The power spectrum 
of the rank 50 cross-spectral subspace Wiener filter is depicted in Figure 1.8. It can be seen 
that by simply selecting a different set of 50 eigenvectors, the resulting power spectrum closely 
approximates the full rank optimal and all clutter and jammers are attenuated. The resulting 
output SINR is 23.20 dB, reflecting a loss of only 0.08 dB in reducing the rank from 223 to 50. 
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Figure 1.6: The full-rank Wiener filter power spectrum 
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Figure 1.7: The reduced-rank eigen-subspace Wiener filter power spectrum 
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Figure 1.8: The reduced rank cross-spectral subspace Wiener filter power spectrum 
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Figure 1.9: The output SINR of the reduced-rank processors as a function of the rank of the data 


The performance of these two reduced-rank processors may be evaluated by plotting the 
output SINR as a function of Wiener filter rank. In addition, we are interested in comparing two 
other reduced-rank processors. 

The first additional reduced-rank processor under consideration is provided via the PCI tech¬ 
nique. This method estimates the full rank noise covariance matrix R n = R* + R c + R^ by the 
largest principal components to form R n . For the purpose of this analysis, the corresponding 
eigenvalues are retained in the estimate as opposed to making the common approximation that 
they have a value of unity. The resulting equivalent direct form array has a constrained Wiener 


filter w given by 






(1.30) 


and the array output is y r (k) = w H x(k). Note that the weight vector is full-rank. 

The second additional reduced-rank processor under consideration uses the GSC form array 
with the DCT unitary operator for subspace selection. The use of the DCT matrix Q in place 
of the matrix of eigenvectors U in Figure 1.3 permits the use of an approximation to the cross- 
spectral metric. Thus, the rank reducing transformation U. in Figure 1.4 is composed of some M 
columns of the DCT matrix Q to be selected by a modified version of the cross-spectral metric. 
This modified metric replaces the numerator of (1.27) with the mean-square value of the product 
formed by the i-th DCT output and the beamformed signal d(k). The denominator is replaced 
with the power of the data in the i-th DCT frequency bin. 
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Figure 1.9 depicts the output SINR of these four reduced-rank processors as a function of 
the rank of the data. For the three GSC processors, the rank of the Wiener filter and the rank 
of the data are identical. Thus, this plot depicts the SINR as a function of the Wiener filter 
rank in these cases, whereas the curve corresponding to the PCI method (utilizing the low-rank 
covariance estimate) requires a rank 224 weight vector for each value of the rank along the x- 
axis. It may be seen from Figure 1.9 that the cross-spectral metric with the eigenvector basis 
performs the best, obtaining the optimal SINR at a Wiener filter rank of approximately 56. The 
largest eigenvector technique using the GSC requires a processor of approximately the dimension 
of the noise subspace eigenstructure, corresponding to 77 weights, to obtain the optimal solution. 
The cross-spectral metric approximation with the DCT basis performs better than the GSC 
eigen-subspace processor until the rank of the weight vector reaches the dimension of the noise 
subspace eigenstructure. The PCI-type processor, which forms a low-rank estimate of the noise 
covariance matrix, performs the worst. The PCI approach does not obtain the optimal solution 
until the covariance estimate nearly utilizes all of the principal components. In addition, the 
PCI processor performs very poorly until the number of principal components included in the 
covariance estimate reaches the rank of the noise subspace eigenstructure. Only at that point 
does the PCI technique begin an exponential convergence to the full-rank SINR; conversely, both 
cross-spectral approaches converge exponentially along their entire trajectory. 

1.1.5 Conclusions 

A cross-spectral metric for subspace selection in partially adaptive array processing is derived and 
a proof provided to show that this metric is the optimal performance measure to use in deciding 
which eigenvectors to keep for rank reduction. Also it is demonstrated that a GSC array, oper¬ 
ating in the subspace selected by the cross-spectral subspace estimator, exceeds the performance 
realized by operation in a subspace which is based upon the principal components method. The 
cross-spectral subspace provides a better performance than the eigen-subspace when the con¬ 
ventional beamformer pattern provides any attenuation of the interference. An example and 
analysis are provided on the assumption of exact signal knowledge, although estimated statistics 
also may be utilized in conjunction with an adaptive algorithm. Finally, the performance of these 
reduced-rank techniques are compared to the PCI method and a low computational complexity, 
DCT based, approximation to the cross-spectral metric. It is demonstrated that the GSC form 
array methods outperform the PCI technique and result in reduced-rank weight vectors, whereas 
the PCI method always requires a full-rank filter. 


1.2 A Simultaneous CFAR Detection and ML Estimation STAP 
Algorithm for Airborne Radar 

The algorithm in this section provides both a CFAR detection and a maximum likelihood (ML) 
Doppler-bearing estimator of a target in a background of unknown Gaussian noise. A target is 
detected, and its parameters estimated within each range gate by evaluating a statistical test for 
each Doppler-angle cell and by selecting the cell with maximum output and finally comparing it 
with a threshold. The CFAR performance of the proposed processor is analyzed by the use of the 
sample matrix inversion (SMI) method and is evaluated in the cases of a fully adaptive STAP 
and two partially adaptive STAP’s. The performances of these criteria show that the probability 
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of detection is a function only of the sample size K used to estimate the covariance matrix and a 
generalized signal-to-noise ratio. The choice of the number K is a trade-off between performance 
and computational complexity. The performance curves demonstrate that the finer the resolution 
is, the poorer the detection capability. This means that one can trade off the accuracy of ML 
estimation with the performance of the CFAR detection criterion. 

1.2.1 Introduction 

A requirement of future airborne radar is to detect targets in an interference background which 
is comprised of both clutter and jamming. Conventional radars are capable of detecting targets 
in the time domain (range) and the frequency domain (Doppler frequency). Space-time adaptive 
processing (STAP) radars provide an additional domain (space) for the detection of targets. 
STAP is a two-dimensional adaptive filtering algorithm which uses both temporal and spatial 
filtering to suppress interference. 

In the receive mode, radar target echoes are received usually within a transmit beamwidth. 
But, depending on the array aperture and, more importantly, on the target range, the angular 
sector encompassed by the mainlobe of the receive beam may correspond to a fairly large swath 
of space. Thus, many measurements and an interpolation are needed to more precisely determine 
the bearing of the target. 

Traditionally, the process of target detection and estimation is carried out in sequence by dif¬ 
ferent operational techniques. However, in this section STAP target detection and ML parameter 
estimation are obtained simultaneously, except for a final interpolation, with a single criterion. 
First, the fully adaptive test (defined in [18] as fully adaptive if the process refers to both the 
spatial and temporal domain) is based on the assumption of known signal parameters, range, 
Doppler, the elevation and azimuth angles and a known noise covariance matrix. Then, this test 
is normalized so that it has a unit variance. Finally, the test is extended to find the maximum 
response of the Doppler-angle filter bank. The system block diagram of this fully adaptive STAP 
is shown in Figure 1.10. To simplify the analysis, the responses of the set of Doppler-angle filters 
are assumed to be mutually orthogonal. Under such an assumption the false-alarm probability 
and the probability of detection with a known covariance matrix are found analytically in this 
section. 

Also, the statistical performance of this detector is extended and analyzed by replacing the 
noise covariance matrix by a maximum likelihood (ML) estimate of the noise covariance which 
is based on actual data. It is demonstrated here that this makes the above test into a powerful 
CFAR detection test which simultaneously estimates both the Doppler frequency and bearing 
angle of the target. 

However, the computational complexity for the required number of degrees of freedom makes 
the above fully adaptive STAP difficult to realize in practice [18]. Two classes of reduced- 
dimension estimator-detectors, which are called partially adaptive STAP, are also developed and 
analyzed in this section. A partially adaptive STAP is defined to be a processor which is not 
fully adaptive simultaneously in both the spatial and temporal domain. The partially adaptive 
element-space post-Doppler STAP is shown in Figure 1.11, and the partially adaptive beam- 
space pre-Doppler STAP is shown in Figure 1.12. Finally, these two suboptimal STAP criteria 
are compared with the above-described fully adaptive STAP. 
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1.2.2 Formulation of the Problem 

The system under consideration is a coherent pulsed Doppler radar. For simplicity the radar 
antenna is assumed to be a uniformly spaced linear array antenna consisting of N elements. 
The radar transmits a coherent burst of M pulses. For each pulse repetition interval (PRI) L 
range samples are collected. With M pulses and N receiver channels, the received data for one 
coherent-processing interval (CPI) comprises LMN complex baseband samples, called the LMN 
data cube. 

For the present it is assumed that a target remains in one range gate during a CPI. The 
snapshot of the MN samples of data for the range gate, which contains target signal, is the so- 
called primary data. The secondary data consists of the outputs of K range gates in the vicinity 
of the range gate of a target to be detected. This set of K range gates consists of a subset of the 

remaining (L-l) range gates. 

The noise components of the data vectors consist of thermal noise, clutter and jamming which 
are modeled as zero-mean complex Gaussian random vectors. The noise components of each range 
gate are assumed to share a common covariance matrix and to be mutually independent. 

For the analysis it is desirable to assume that each filter is statistically independent. This is 
achieved approximately by the proper spacing between adjacent filters and beams. 


1.2.3 Derivation of the Test Statistic (when Noise Covariance is Known) 

Let x be a MNxl snapshot of a given range gate, i.e. the observation vector, 

X = [* 11 ,212, ' ’ • > x lNi *21) *22i • • •) *2 Ny‘ ' ' i x M\i X M2, ' ' ' ■ X MN\ ) (1'31) 

where X{j represents the sample from the i th pulse and the j th antenna element. 

Two hypotheses are postulated : 

H 0 : x = n , 

Hi: x = s + n , 


(1.32) 

(1.33) 


where n is a M!N X1 Gaussian noise vector with known covariance matrix R and s is a known 
MNxl target vector with phase angle $ n , Doppler frequency u m and a complex amplitude a 
with random phase <fi. Thus s can be represented by the MN dimensional vector, 

s = ab(u m ) <g) a(# n ) = av(w m , #„) , (l- 34 ) 


where b(u? TO ) is a Mx 1 temporal steering vector, a(<?„) is a NX 1 spatial steering vector, ® denotes 
Kronecker product and v(w m , 0„) = b(w m ) <g> a(0 n ) is a MNx 1 steering vector. 

Since the starting phase (f> is a random variable, the random vector x is not strictly Gaussian. 
However, when conditioned on <p, x is complex Gaussian under both hypotheses H 0 and H\. 
Thus the conditional probability densities of x are 


P(x|ffi,$ 

P(X|# 0 ) 


exp{— [x - av(w m , tf„)] H R x [x - ftv(w m , $„)]} 

tt^IRI 

exp{— x h R~ x x) , 

7±1 


(1.35) 

(1.36) 
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where a = \a\e^ with the starting phase <f>, which is uniformly distributed between [0,2jr], and 
R = £{nn ff }. 

The likelihood ratio test A mn of the densities in (1.35) and (1.36) is 


A 


mn 


P{ 4gi) _ 

P(x\H 0 ) P(x\H 0 ) 

= J exp |2i?e |a*v H (w m , ?? n )R _1 xJ - |a| 2 v H (w m , tf n )R _1 v(u; m , ti n ) j —d<$> 

Hi 

= exp|-|a| 2 v // (a; m ,7?„)R _1 v(w TO ,^„)j/ 0 ^2|a| |v // (w m ,?? n )R _1 x|) < T 0 (1.37) 

H 0 


where I 0 (x) is the modified Bessel function of the first kind. Since a covariance matrix is non¬ 
negative definite, i.e. v H (u m , $ n )R _1 v(6j m , d n ) > 0, the exponential term on the left hand side 
of the above equation can be absorbed into the threshold. Hence the above test becomes 


Io{ 2 


a 


Hi 

v H (u m , ^ n )R -1 x ) < T 0 exp{|a| 2 v^(w m ,# n )R _1 v(u> m ,# n )} 

Hq 




(1.38) 


Since I 0 (x) is a monotonically increasing function of positive x, this test further reduces to 
the following test : 


v^(u> m , # ra )R X x 


-l 


Hi 

> 

< 

H 0 


W) 

2\a\ 2 


= T, 


(1.39) 


Finally, a normalization with respect to the output noise power, yields the following optimum 
group invariant test with threshold T mn for a change of scale and starting phase angle <j> (see 
[26]): 


v H (u m ,d n )R *x 


-l 


mn 


H i 

> 

< 


T 2 


v H (cj m ,'d n )R l v(u> m ,'d n ) A v H (u m ,ti n )R l v(uj m) '& n ) 

11 0 


= T, 


mn 


(1.40) 


It is assumed for the present that R is known except for a multiplicative scale factor. The test 
in (1.40) can be viewed in another way. Now let 


R 1 / 2 v(w m , & n ) = V(w m , <?„), 


v (cj m , $ n ) 


'(^m? ^n)v(lc> m , '&n) 


= v(u m , , & n ) and R ly/2 x = x (1.41) 


A substitution of the above transformations into (1.40) the test r mn becomes 


Tmn — 


V* 


(^m,$n)x 


< 

H 0 


mn 


(1.42) 


Thus, the test is equivalent to a measure of the projection of the whitened observation x onto 
the unit “whitened” steering vector v(u; m? $ n ). 

The conditional pdf of the test r mn under Hi can be derived easily by the following theorem 
(see [29] and [37] ): 
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Theorem 1.1 If x is N m (m x , I m ) and B is an m x m projection matrix of rank k then x ff Bx 
has a noncentral Xk(S) distribution where S = m“Bmi, 

In (1.42) x is Nmn («R" 1 / 2 v, l) and v(u> m , i? n )v H (u; m , d n ) is a projection matrix of rank one. 

Then, by Theorem 1.1 tlie test r mn is equivalent to xliPmn), where p mn is the generalized output 
signal-to-noise ratio of the detector given by 

Pmn = |«| 2 V H (w m ,^n)R _1 v(w m ,l? n ) . (1-43) 


Hence the conditional pdf of r mn under H i is given by 

P(r mn \Hi) = e ~ rmn ~ Pmn I 0 (2y/p mn r mn ) U(r mn ) , (1.44) 

where [/(■) represents the unit step function. By setting p mn equal to zero in (1.44), the condi- 
tional pdf of r mn under H 0 is obtained as 

P{r mn \H 0 ) =e- rmn U(r mn ) . (1.45) 

For the situation of a known steering vector and known covariance, the false-alarm probability, 
denoted by PFA mn i an d the probability of detection, denoted by Pr> mn , are respectively given by 

POO 'T 

P FAmn = Pr{rmn > T mn \H 0 } = / e~ r ™dr mn = e~ T ™ , (1.46) 

JTmn 

POO . POO _ 

PDmn — I P( r mn\Hl)dr mn = I e Trnn Pmn Io (2y^ Pmn r mn ) dr mn , (1.47) 

jTmn JTmn 

where the threshold T mn for a given false alarm probability, is given by 

T mn = - In (PFA mn ) • ( L48 ) 


Actually, the target steering vector is unknown in the real radar problem. For a 

single target, the standard technique is to evaluate the above test statistic over a discrete set of 
Doppler-angle vectors, i.e. forming a Doppler-angle filter bank, and declaring target presence if 
the maximum filter output of the filter bank exceeds the threshold TV Thus, the test r over the 
filter bank is given by 



(1.49) 


It is shown in the next section that this maximization over the filter bank also provides a ML 
estimate of the target steering vector. That means that the detection and estimation of Doppler 
and angle are performed simultaneously. 

For the Doppler-angle filter bank case, by the independence of each Doppler-angle channel, the 
false-alarm probability, P FA , and the overall threshold, T h are related by the following formulae: 


Pfa 


Pr{ Max (r mn )} > T h \H 0 } = 1 - Pr{r n < T h , r 12 < T h , ..., r M N < T h \H 0 } 

1 <m<M 
1-KnZN 


N M 


1 - II II P r{ r ™n < T h \H 0 } = 1 - (1 - e^)^ , 


(1.50) 


n=1 m —1 


and 


T h = — ln[l — (1 — Pfa) ** n ] • 


(1.51) 
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The probability of detection P D for the Doppler-angle filter-bank is given approximately by 
P D = Pr{ Max (r mn ) > T h \Hi) = 1 - Pr{r n < T h , r 12 < T h , ..., r M N < T h \Hi} 

1 <m<M 

MN—1 


= 1 - Pr{r mn < n Pr{n<T h \H 0 } 


k-1 


-L 


> — T rnn Pmn 


^rnn'f'mn. 


!r m n(l ~ e~ Th ) MN ~ 1 


(1.52) 


To arrive at this approximate, but quite accurate, result for large MN, it is assumed that the 
target can only appear in one Doppler-angle channel. 

1.2.4 The Maximum Likelihood Estimator of Target Direction 

Assume that the snapshots x(i) from different range gates i, for i=l,2,...,K, are independent. 
Then the joint pdf P(x(l),x(2),.. .,x(K)) of the observed data from different range gates is 

given by 


P(x(l),...,x(A)) = fij’MO) 


t = l 


1 x MNK f A 

-) |R|- A 'exp -Ex^WR-^W 
' \ 1=1 


(1.53) 


Let X = [x(l),x(2),...,x(A')] 

follows : 


Then this joint pdf P(x(l),x(2),..., x(A")) can be rewritten as 


P(x(l),...,x(X)) = 


MNK 


R| h exp j —trace (x H R *X^j 


}_\ MN |Rj — 1 exp | -trace (r- 1 ^XX H 


(1.54) 


The maximum likelihood estimate of the covariance R is that value of R which achieves the 


maximum, 


Maxi |R| 1 exp (^-trace (r XX j 


Min 


tnjlog |R| + trace (r , 


(1.55) 


where S = ±XX H . By taking the partial derivatives of (1.55) with respect to the components 


of R and setting the result to zero, yields 


d 

dR 


log |R| + trace fR =0 


(1.56) 


(r- 1 ) 7 - (r-^r- 1 ) 


(1.57) 
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so that finally the ML estimate is given by 


R = S, 


(1.58) 


where denotes the partial derivatives with respect to the components of R 

Next, the maximum likelihood (ML) estimate of the complex amplitude a and the steering 

/\ 

vector v(u> m , $ n ) are derived by a use of the ML estimate R in (1.58). The required ML estimate 
of the parameters, a and v(w m , $ n ), are defined by the pair (&, v(u; m , $ n )). This estimate is that 
pair (cv, v(u> m ,$n)) which attains the maximum in the following maximization process : 


Max P ^ 


Of 


O' ^v{uJrn j^n) 

= Max 

Qr,v(u/rri)^n) 

: Max g (a, v(w ro ,tf„)) , 


, v(u> m ,tf n ),R) = Max log P (x|a,v(w m , tf n ),R) 

/ a,v(w m ,tf„) V i 


{[x - av(w m , d n )] H R 1 [x - av(w m , <?„)]]■ 


(1.59) 


where g (a, v(w m , $ n )) is the log-likelihood function (excluding the constant terms). This maxi¬ 
mization is performed first over a , as in [33] and [17], to obtain that value of a for which 


d_ 

da 


g (a,v(u m , 


K)) = 0 


(1.60) 


or 


or 


d_ 

da L 


x^R l -x.-a-x H R 1 v(u m , & n ) - a*v H (uj m , $„)R x x 

+Q'*av ff (w m ,^ n )R -1 v(w m ,i? n )l = 0 , 


x. H R m l v(u m , ■&„,) - a*v Ji (u> m , d n )R 1 v(u > m , K) = 0 • 


*..H 


-l 


(1.61) 

(1.62) 


Hence one obtains 

V H (u>m,d n ) R _1 X 
a — ——-^- 

V (CcJ m ,$ n )R ^ v(u? m , ^ft) 


(1.63) 


A substitution of the above result for d, the estimate in (1.63) of the amplitude a , into the 
log-likelihood function in (1.59) yields finally the ML estimate v(u; m ,$ n ) which is that values of 
v(c^m,$ n ) which achieves 


Max 

v(u/ m ,tfn) 




v H (u m ,tf n )R *x 


-l 


v H (u> m , d n )R -1 v(u; m , tin) 


(1.64) 


1.2.5 Performance Analysis of the Detector (when the Noise Covariance is 
Unknown) 

In the real radar problem, the noise covariance is unknown. The best way to find the required 
maximum likelihood ratio test for detection is to replace the noise covariance matrix with its 
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maximum likelihood estimate using the observed data for a set of different range gates in the 
vicinity of the range gate of the target to be detected. The test statistic under the signal-plus- 
noise hypothesis H\ is derived first. If R is replaced by R in (1.40) and ^(XX^) is substituted 

A 

for R, the test r mn becomes 


|v H (w m ,tf n )R *x | 2 ^ 

~ "a " “ ” “ < J- mn 


v"(w m ,'i9 n )R 


= ( K ) 


v H (a; m , 1 9 n )(XX^)- 1 x 




-*■ mn 


(1.65) 


It is assumed that the observed data from the different range gates are independent and have the 
same statistics. Then a whitened X is obtained as follows : 

X = [x(1),x(2),...,x(^)]=[R 1 / 2 w(1),R 1 / 2 w(2),..,R 1 / 2 w(A')] = R 1/2 W , (1.66) 

where W = [w(l), w(2),..., w(A')] is an MN x K matrix comprised of independently identically 
distributed (i.i.d.) complex Gaussian random variables with a zero mean and a unit variance, 
each random variable [w(/)],-, for i=l,...,MN, /=1,...,K, is distributed N(0,1). By substituting 
R 1 / 2 W for X, the test r mn now becomes 


Tmn — 


\v H (u m , i)R- 1/2 (WW g )- 1 R~ 1 / 2 x| 2 f 1 Tmn_ _ rp' 
v H (u> m , ■# n )R -1 / 2 (WW' ff )~ 1 R -1 / 2 a($ J s) 5 K 


(1.67) 


Let Q be an arbitrary unitary and Hermitian matrix and w be a white Gaussian random vector 
with Cov(w) — Imn « The mean and covariance of Qw is the same as that of w. Thus, Qw 
and w are equivalent statistically since they are both Gaussian and have the same second order 
statistics. It follows that 


QW= W and WW ff = QWW" Q , 

d d 

where denotes equality in distribution. Then r mn in (1.67) becomes 

d 


( 1 . 68 ) 


T mn — 


|v // (cu m ,^)R- 1 / 2 Q(WW // )- 1 QR- 1 / 2 xl 2 T> 

d 79 n )R _1 / 2 Q(WW^) _1 QR _1 / 2 v(u; m , $„) H mn 


(1.69) 


By the Householder transformation (see[35],[34] and Appendix A) one can find next a unitary 
matrix Q such that 


QR m 1 ^v(w m ,i? n ) = i? n )R m 1 v(w m , fln)eie J01 , 


(1.70) 


where ei = [1,0, ...,o] T is a unit vector, 6\ = n + arg([R m 1//2 a(i9 s )]i). [z]i = first coordinate 
component of z, and arg{x + jy) — tan~ l (y/x). By the above result one obtains the following 

equivalent test : 


|^(u; m ,^)R^ 1 a(^)ef(WW^)- 1 QR- 1 x|^ "i 
rmn ~ (v^( Wm ,^)R™ 1 v( Wm ,^ n ))ef(WW // )- 1 e 1 mn ’ 


-U,|2 H 1 


(1.71) 
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or 


|ef(WW H )- 1 QR- 1 /2 x |2 Hi ^ 

< 

Ho 


mn 


[(ww ff )-i] lt 


.. . t 

T 

< •* mn 5 


(1.72) 


where [(WW^) 1 ]i > i represents the (1,1) scalar element of the matrix (WW^) 1 . In the above 
equation, QR" 1 / 2 x can be simplified in the following manner: 

QR -1//2 x = QR -1 / 2 (av(u mi $„) + n) 

= exp { j (ir + arg ([R _1/2 v(u; m , t?n)] 1 ))}a^v ff (w m ,t? n )R- 1 v(« m , $ n )ei + Qw 

(1.73) 


= u 

d 


where w is a MN x 1 white Gaussian random vector and u is a MN x 1 vector with inde¬ 
pendent entries, N(0,1), except that the first component, ui, which is N(m^^,l) with = 

y/pmn exp (V + arg ^[R _ly/2 v(cu m , $ n )]i))}- A substitution of (1.73) into (1.72) yields the test, 

ef(WW ff )“ 1 u| 2 Q , 


^mn — 


T 

<C x mn 


(1.74) 


, [ ( ww H )-i] lfl 

Before deriving the pdf of the one-filter test r mn in (1.74), the pdf of [(WW^)" 1 ]^! is derived 
first. Let u be a MN X 1 vector which is partitioned as follows: 


u = 


U 1 
/ 

u 


(1.75) 


where u\ is a scalar and u is a (MN-1) x 1 vector. Next, W is partitioned in the manner, 


W 


'1 

*H 


(1.76) 


where zf is a 1 X K vector and Z H is a (MN-1) X K matrix. From (1.76) one obtains the 

jrr 

following partition for WW : 


WW H = 


1—1 

N 

N 

1_ 

zf Z ' 


' A n 

A12 

_ Z H z, 

Z^Z 


A 21 

A22 


(1.77) 


From [37], the inverse (WW^) 1 of the partitioned matrix WW^ is given by the following 
formula: 


(WW ^)" 1 = 


1 __ 


(WW^)" 1 

1,1 

(WW^)- 1 

1,2 


Bn B 12 



(WW 11 )- 1 

2,1 

(WW 77 )- 1 

2,2 


B 21 B 22 


(1.78) 


where 


Bn 

B 21 

B 12 

B22 


= ^An - Ai2A 22 1 A2l^ , 

= ““A22"A21 (An - Ai 2 A 22 1 A 2 ij 

* 

= - (An - Ai 2 A2 2 1 A 2 i) 


-1 


A 12 A 2 2^ , 

-1 


= A 22 1 A 2 i ^An - Ai 2 A 22 1 A 2 i^ A^A^ 1 + A 22 . 


(1.79) 
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Then, by the above formula 77 


WW 


~i 


Ji,i 


is obtained as 


V = 


Z 1 Z 1 


-zfz( 


z"z) 


-1 


rwH„ 
Z 7 i\ 


-1 


d Xk-n+i 


( 1 . 80 ) 


(see Appendix B), where is a complex normalized chi-squared random variable with m degrees 
of freedom. A substitution of (1.79) into (1.78) and then along with (1.80) into (1.74), yields the 
following result for r, 


ran 


/VI ■ 

U \ [(WW)' 

+ 1 
J 1,1 1- 

;ww) -1 ’ 

2 

/ 

U 

1,2 

1 mn — 

(WW)" 

T 

- 1,1 


= v 


«i -zfZ(Z ff Z) _1 u 


(1.81) 


Since the summation of the independent Gaussian random variables is also a Gaussian random 
variable, one obtains 


Ul -z?Z(Z H Z )-V = N(m' Zmn , 1 ) - (zf Z(Z h Z)~ 2 Z h zi) 1 ^ 2 N(0, 1 ) 

a 

1/2 _ 


(l + zfZ(Z H Z) _ 2 Z // zi) u , 


(1.82) 


where u is N(m , Zrnn , 1). Thus r mn in (1.81) becomes 


ran 


V | ~ |2 

= T « 


C 


(1.83) 


Here £ is equivalent to a beta distributed variate (see Appendix B) defined as follows : 

-l 


c = (l + zfZ(Z // Z)" 2 Z H z 1 ) 


(1.84) 


The pdf of this beta distribution is given by 

1 


P( 0 = 


■c 


K-MN +1 


B(MN -1,I< - MN + 2) 
where B(M,N) is the beta function defined by 

(M - l)!(iV - 1 )! 


( 1-0 


MN-2 


(1.85) 


B(M, N) = 


(.M + N - 1 )! 


( 1 . 86 ) 


Since the magnitude squared of a nonzero mean, say m, Gaussian random variable is equivalent 

to a non-central variate, one has |u| 2 = Xi(Pmn)- This test r mn in (1.83) finally 

d 

becomes 


^Vnn — 


Xl ( Pmn ) 
d XK-MN +lC 


(1.87) 


Since (n 2 x 2 , (Pmn))/(«iXn 2 ) is a noncentral F distribution (see[31], [29] and [37]), the conditional 
pdf of r mn , under £ and H i, is given by 


i Pmn 


P ('--K. *> = B(UK- MN +1){1 + rJy<-^ K ~ MN + 2 ’ TT?^> (1 ' 88 » 


PmnC^ 
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where \F\(a-,b-,x ) is the confluent hypergeometric function defined by 

a(a + 1 ) • • • (a + k — 1 ) x k 

,Fi(a;6; *) = g t(il+ 1} ... (t+ k _ t) JP 

K —U 

Averaging P(r mn \C, Hi) over one has the conditional pdf of r mn , under Hi, as follows: 

1 


(1.89) 


P(r mn \Hi) = 


x 


/o 


B{ 1 , K - MN + l)B(MAf - 1, K - MN + 2) 

1 & —Pmn /~K — MN- 1~2/1 a\MN 2 ^ 

C ZF&- (I< -MNP 2,1, P ™^ r )d(. (1.90) 


(1 + r mn () K ' 


mn 


Setting p mn = 0 in the above equation, the conditional pdf of r mn , under Ho, is given by 


P(r mn H 0 ) — 


1 


l 


1 (-K-MN +2 


( 1-0 


MN -2 


(1 + r mn Q K ~ MN+2 


d(l 1.91) 


5(1, K -MN + 1)B(MN - 1, K - MN + 2) 

From (1.46) and (1.91) the false-alarm probability PFA mn of a single Doppler-angle filter is 
given finally by 

1 

Pfa, 


x 


'■mn 


5 ( 1 , K - MN + 1)B(MN - l, K - MN + 2) 

• 1 ^K-MN+2M _ qMN-2 


mn 


d^dr 


mn 


Jo (l 


C 


+ T' C 

■ mnh 


(1 + r mn C ) A '- M7V+2 

K-MJV+l 

(1-C) WJV — 2 dC .(1.92) 


B(MN - 1, K - MN + 2) 

A 

From (1.50) and (1.91) the false-alarm probability of the Doppler-angle filter bank, P FA , is given 

by 

1 

Pfa = 1 


x 


5(1, I< - MN + 1 )B(MN - 1, K - MN + 2 ) 

' mn 

■ 1 £ K-MN+2/i _ qMN-2 


J 1 


(1 + r mn QK-MN+2 


■d,C,dr 


mn 


(1.93) 


Note that the above expressions for Pp Amn and Pfa are functions only of K( the number of 
range gates used to estimate the covariance ), N (the number of antenna elements ), and M (the 
number of pulses in CPI). Hence these false-alarm probabilities are independent of the signal and 
noise power. As a consequence, the test in (1.65) is a CFAR criterion. 

The probability of detection for the single Doppler-angle channel case is obtained from (1.47) 

and (1.90 as follows : 

1 


Pd 


mn 


5(1, K - MN + 1 )B(MN - 1 , I< - MN + 2) 

•oo rl e -pmn^K-~MN+2^ „ QMN-2 


X 


j;j 


= 1 - 


(1 + r mn Q K ~ MN + 2 

g — Pmn 


iFi(K - MN + 2,1, P ™i rm - )d(dr 


i + (r 


mn 


mn 


B(MN -1,K - MN + 2) 

1 00 n k 

C K-MN+ 1 (1 _ C) MN- 2 Y Pmnj (fc + 1, K - MN + l)c?C) (1-94) 


So 


k—0 


mn 
1 + C^mn 
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where I x (a,b) is an incomplete beta function. 

Similarly, the probability of detection for the Doppler-angle filter matrix case can be obtained 
finally from (1.52),(1.90) and (1.91) as follows: 


Pd = 1- 


5(1, I< -MNP 1 )B(MN - 1, K - MN + 2) 


x 


ri 


1 e-pmn(^k -MN+2 (1 _ f\MN 2 


(i-C) 


(1 + r mn QK-mn +2 


iFi(K - MN + 2 , 1 , 


PmnQ^mn 


5(1, I< - MN + 1 )B{MN - 1, K - MN + 2) 


r( 

Jo Jo 


1 + C r mn 
1 ^K-MN+2 ^ _ qMN—2 

(1 + r mn 0 K ~ MN+2 


)d(dr 


mn 


d(dr 


mn 


X 

MAT-1 

(1.9.5) 


The probabibity of detection as a function of the generalized signal-to-noise ratio, p mn , is 
shown in Figures 1.13 and 1.14 for M=N=10 and M=N=5, respectively. This leads to the two 
kinds of curves, plotted on the same figures, one for a single channel and another for a filter 
bank. With the same parameters (M, N and K), the performance of the single channel is better 
than that of the filter bank. From Figures 1.13 and 1.14, one can find the effect of increasing the 
numbers M and N in the filter bank on the performance. As the number of filters in the filter 
bank becomes larger, the performance of the filter bank is much degraded from that of the single 
channel. The effect of sample size K on the performance is not very significant for M=N=10 in 
Figure 1.13, but about 2 dB of degradation as K goes from 5MN to 2MN as shown on Figure 1.14 

for M=N=5. 


1.2.6 Performance Analyses of Two Partially Adaptive STAP Detectors 

Two types of partially adaptive STAPs, element-space post-Doppler STAP and beam-space pre- 
Doppler STAP, are discussed in this section. Both of these criteria have the advantage of reduced 
dimension but both sacrifice some performance. The element-space post-Doppler STAP uses non- 
adaptive temporal filtering first and then performs the spatial filtering adaptively. In contrast, the 
beam-space pre-Doppler STAP performs in the reverse order, i.e. non-adaptive spatial filtering 
first, followed by adaptive temporal filtering. 

CASE 1: Element-Space Post-Doppler STAP 

Let y m be the Nxl output vector of the m th Doppler filter. Then a snapshot of data in the 
presence of a target x within the m th Doppler filter is expressed by 


y m = (f m <g> I n ) h x = (f% <g> Ijv)(s + n) = a{%b{u m ) a(0 n ) + ( f m <2> I/v)"n 


H. 


(1.96) 


where f m represents a Mxl DFT vector for the m th Doppler filter. 

Since the DFT is a set of M linear operations, one has n m — (f m ® Ij^) H n so that the output 
noise of the m th filter is a Gaussian random vector. The Doppler channel output noise vectors 
n m for 111=1,2,• * -,M are statistically independent [38] by the assumption that the correlation 
structure of the noise is much less than the duration of the observations. The detection test with 
a known covariance matrix for this element-space post-Doppler STAP is given by 


= max< 

m,n 


[a g (^)Rm 1 yml 2 1 

a# (tfjR^a^) J 


> 

< 

H 0 


T, 


mn 


(1.97) 
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where R m = E{ n m n^} is the m th Doppler channel output noise covariance. The maximum of 
this test is obtained over all beams for each Doppler filter. This STAP detector suppresses the 
interference and noise by adaptively placing nulls in the angular domain only, and using DFT 
filtering in the Doppler frequency domain. 

The ML estimate a(# n ) of the target angle for a given Doppler frequency, u; m , is obtained by 
the same argument used in the previous section for estimation. That is, a($ n ) is that value of 
a($ n ) which obtains the maximum, 


max 

a(tfn) 



(1.98) 


Note that this STAP detector provides simultaneous detection and ML estimation for the target 
angle only. 

Similarly, the false-alarm probability for the single Doppler channel, PFAmni an d for the 
Doppler- filter bank, Pfa > are given respectively by 


PFAmn = e~ Tmn , (1.99) 

P FA = 1- (l-e- Th ) MN . (1.100) 

The probability of detection for a single-Doppler channel, Pc) mn , and for the Doppler-filter bank, 
Pjj, are given, respectively, by 


poo 

PUmn — I e 77171 ^ 77171 1 0 (2\/pmn r mn)dv 

J Tmn 


mn > 


Pd 


f Th 

= 1_ / 

JO 


i —T' mn 


Pmn i^ (2 v / p mn r mn )dr mn (l 


e -T h jMN-l 


( 1 . 101 ) 

( 1 . 102 ) 


where the generalized signal-to-noise ratio p mn is given by 


Pmn 


af%b(u m ) a"(?? n )R rn 1 a(i?„) . 


H 


(1.103) 


When the noise covariance R m of the m th Doppler filter output is unknown, R m in (1*97) is 

A 

replaced by its ML estimate R m which is given by 


Rm = Y,y^( k )ym(k) , (1.104) 

A k =l 

where y m (k ) represents the observed data at the m th Doppler filter output from the nearby 
k th range gate. Then, the false-alarm probability with unknown covariance for single-Doppler 
channel, PFAmni an< ^ for the Doppler-filter bank, P F Ai are given, respectively, by 


Pfa 


mn 


5(1, K — N + 1 )B(N - 1, K - N + 2)(K - N + 1) 

K-N+l 

(1 - 


X 


a r 


c 


-I -T 1 C 

» mn S 


(1.105) 


Pfa = 1 - 


f / 

l 


5(1, K-N + 1 )B{N - 1, I< - N + 2) Jo (1 + r mn Q K ~ N + 2 


L 


1 (K-N+2( 1 _ QN-2 


d(dr 


mn 


MN 

(1.106) 
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A 

The probability of detection PDmn w ^h unknown covariance for a single Doppler channel and 

A 

Pd for the Doppler-filter bank are given, respectively, by 


Pd, 


Pmn 


= 1 - 


B(N - 1, I< - N + 2) 


r 1 

/ < 

JO 


K-N +1 


(i-C) 


N—2 


°o ~k 

( k + 1 ’ K ~ N + i W 


k =0 


i+CT, 


(1.107) 


Pd = 1 


5(1, K-N + 1 )B(N - 1, K - N + 2) 


1 „-PmnfK-N+2M _ QN-2 
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2 r 1 ^A^—AT-|-2 ^ 2 

5(1, I< - N+ l)B(N - 1, I< - N + 2) J 0 (1 + r mn Q K - N + 2 
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(1.108) 


CASE 2: Beam-Space Pre-Doppler STAP 

Let z n be the Mxl output vector of the n th beam. Then a snapshot of data x with a target 
present and z n are related by 

Zn = (lM<8>gn) F x= (I M ®gf)(s + n) = agfb(u; m ) a(tf„) + (Im ® g„) F n , (1.109) 

where g n represents a Nxl spatial DFT vector for the n th beam. 

By the same arguments used in Case 1 the final detection test with a known covariance matrix 
for the beam-space pre-Doppler STAP is given by 


max 


|b^(w nl )R n 1 z 5 


m,n I b H (0J m )Rnb(u> m ) 


H\ 

> rp 

< mn 

H 0 


( 1 . 110 ) 


where R n represents the output noise covariance of the n th beam. In contrast to Case 1 the beam- 
space pre-Doppler STAP suppresses the interference by placing nulls in the Doppler-frequency 
domain and in the angle domain by using fixed (non-adaptive) spatial DFT filtering only. 

Similarly, the ML estimate b(u> m ) of Doppler frequency is that value of b(o; m ) which obtains 
the maximum, 


max 


b H (oj m )K n l z : 


b (“'”>) 1 b H (w m )R m 1 b(w m ) 


(un) 


In contrast to Case 1, the detector for this case provides simultaneous adaptive detection and 
ML estimation of the Doppler frequency only. 

All of the performance results of the test for the beam-space pre-Doppler STAP are obtained 

2 u 

by simply exchanging N with M and substituting p mn = ag^a(# n ) b H (u) m )tt~ l b(u> m ) for p mn 
in the corresponding generalized signal-to-noise ratio for the element-space post-Doppler STAP. 

The probability of detection of these partially adaptive STAP’s as a function of the gener¬ 
alized signal-to-noise ratio p mn , is shown in Figures 1.15 and 1.16 for M=N=10 and M=N=5, 
respectively. The effect of sample size K on the performance is presented in theses figures. 
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Performance comparisons between fully and partially adaptive STAP are made in Figures 1.17 
and 1.18 for M=N=10 and M=N=5. The performance difference between partially and fully 
adaptive STAP is not large for a sample size K which is five times the number of degrees of 
freedom for M—N=10 and M=N=5. With sample size K which is twice the number of degrees 
of freedom the performance degradation of partial STAP with respect to full STAP processing 
is about 5 dB for M=N=10 and about 7 dB for M=N=5. Interestingly, the performance of the 
fully adaptive STAP with M—N=5 and K~5MN is about the same as that of partially adaptive 
STAP with M—N=5 and K=5N. 

1.2*7 Conclusion 

A simultaneous CFAR detection and ML estimation algorithm is developed for both fully and 
partially adaptive STAP in this study. The performance of these CFAR tests are analyzed for 
a known and a maximum likelihood estimated covariance matrix. The performance curves of 
the probability of detection show that the performance is a function of the sample size K used 
to estimate the covariance. The choice of the number K is a tradeoff between performance and 
computational complexity. The optimum number is between two and five for K. The performance 
curves also show that the finer the resolution is, the worse the detection performance is expected 
to be. 

1.3 A Fast CFAR Detection Space-Time Adaptive Processing 
Algorithm 

All of the conventional CFAR detection algorithms which use space-time processing involve a 
time-consuming matrix inversion operation. Based on today’s technology, this computational 
complexity sometimes makes the full-rank solution unrealizable. In this section, a CFAR detection 
algorithm, which does not need a matrix inversion, is developed by an adaptation and extension of 
Hotelling’s principal component method studied recently [13] by Kirsteins and Tufts. If also the 
HT method [30] is incorporated into this new algorithm, the computational complexity is reduced 
to O (N) from O (N 3 ) for the matrix inversion used in the conventional adaptive algorithm. 

1.3.1 Introduction 

Future airborne radar are expected to be required to provide long-range detection of targets with 
increasingly smaller cross-sections. This function must be performed in environments where the 
clutter can be quite severe. Space-Time Adaptive Processing (STAP) refers to multidimensional 
adaptive filtering algorithms that simultaneously combine the signals from the elements of an 
array antenna and the multiple pulses of a coherent radar waveform to suppress interference and 
provide target detection. 

The first published work on STAP for radar was by Brennan and Reed [17] in 1973, in which 
optimum space-time filtering was described. Later in 1986 a more all-encompassing generalized 
likelihood ratio test (GLRT) on STAP for radar detection was derived by Kelly [24]. Kelly used 
the sample matrix inverse (SMI) technique of Reed, Brennan and Mallett in [25]. Recently Chen 
and Reed [26] developed a simpler CFAR detection test than the GLRT obtained by Kelly. Robey, 
Fuhrmann, Kelly and Nitzberg [27] also developed independently the same result as Chen and 
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Reed. Although this new CFAR test showed a slightly degraded performance , it was easier to 
analyze than the GLRT. However, all of the previous STAP algorithms involved a time-consuming 
matrix inversion operation. 

The principal components (PC) technique was developed originally by Hotelling [28]. Hotelling 
showed that the dimension of the problem often could be reduced without sacrificing too much 
of the information contained in the covariance matrix. Muirhead’s book [29] is an excellent refer¬ 
ence for the PC method. A new unnormalized GLRT which uses the PC technique is developed 
by Kirsteins and Tufts in [13], but they do not obtain the probability density functions (PDF) 
needed to evaluate the performance of the test statistic. 

In this study the normalized LRT in [26] and [27] is modified by a use of the PC technique. 
This test exhibits the very desirable property that no matrix inversion is needed and that the 
computation complexity can be reduced to O(N) by incorporating the HT method found recently 
by Hung and Turner in [30]. 

The PDF of this new detection statistic is derived here for both the noise-alone case and 
the signal-plus-noise case. In the noise-alone case, the PDF proves to be very simple and the 
probability of a false-alarm (PFA) is shown to depend only on the number of parameters and the 
sample data size. Thus this new PC test is a CFAR criterion. 

1.3.2 Derivation of the Test Function 

The system under consideration is a coherent pulsed Doppler radar. For simplicity the radar 
antenna is assumed to be a uniformly spaced linear array antenna, consisting of N a elements. 
The radar transmits a coherent burst of N p pulses. For each pulse repetition interval ( PRI ) N r 
range samples are collected. With N p pulses and N a receiver channels, the received data for one 
coherent-processing interval ( CPI ) comprises N r N p N a complex baseband samples, the so-called 

N r N p N a data cube. 

It is assumed in the present treatment that a target remains in only one range gate during 
a CPI. The snapshot of the N = N p N a samples of data for the range gate, which is assumed to 
contain the target signal, is called the primary data. The secondary data consists of the outputs 
of L range gates in the vicinity of the range gate of the target to be detected. 

The noise components of the data vectors consist of thermal noise, jamming or clutter, which 
are modeled as zero-mean, complex Gaussian, random noise vectors. The noise components of 
each range gate are assumed to share a common covariance matrix and to be mutually indepen¬ 
dent. 

Let x be a Nxl snapshot of a given range gate, i.e. the observation vector, 

X = [xi, X2, ■ • •, Xn] T ■ (1.112) 

Two hypotheses are postulated : 

H 0 : x = n , (1.113) 

Hi: x = s + n , (FH4) 

where n is a N X 1 Gaussian noise-plus-clutter vector with known covariance matrix R and s is 
a known Nxl target vector with a complex amplitude a and a random phase <f>. Thus s can be 
represented by the N-dimensional vector, s = av, where v is a Nxl steering vector. Then the 
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usual decision rule, which is based on the matched filter for incoherent detection (see [26]), is 
given by 


r — 



Ht 

> 

< 

H 0 



(1.115) 


A normalization with respect to the output noise power yields a group-invariant test with respect 
to the groups of change-of-scale and starting phase as follows: 


v ff R _1 x 

v ff R -1 v 



(1.116) 


This test is independent of the scale of the assumed known covariance matrix R. The generalized 
output signal-to-noise ratio, associated with the detection test in (1.116) is given by 



(1.117) 


where E{} denotes the expectation-value operator. 


1.3.3 Rank Reduction of the Clutter-Plus-Noise Covariance 


The NxN covariance matrix R of the background clutter and noise can be modeled as 

R = + <t 2 I, 


(1.118) 


where a 2 1 represents the covariance of the thermal noise and U$U f/ equals the covariance of 
the interference. Here U is the NxN eigenvector matrix, and is the eigenvalue matrix of R 
with the eigenvalues Ai > A 2 > ... > An > 0. The following matrix inversion formula, verified in 
the Appendix D, namely, 

(A + BCD) -1 = A -1 - A -1 B (DA -1 B + C -1 ) 

readily yields the inverse of the noise covariance in the following form: 


DA 


-l 


R -1 = 


(u<frU H + a 2 l) 1 
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= a 
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-2 


-2 


-2 


Ijv - u (u^U + <7 2 $ -1 ) X U 
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I N - U (iff + cr 2 $ -1 ) U 
I N - UAU^ 
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A = 
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Ai+c 2 
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A 2 +<t 2 
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0 


0 
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An. 

Ajv+ cr2 


(1.119) 


( 1 . 120 ) 


( 1 . 121 ) 
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Note that A;/(A; + a 2 ) = 1 if A; > a 2 and A;/(A; + ct 2 ) = A 8 /«7 2 if A; < a 2 . From these 
observations, noted first by Kirsteins and Tufts, R -1 can be approximated by a reduced rank 
matrix, denoted by RjT 1 , using the principal component method. The reduced rank k of R _1 is 
selected in accordance with the condition, 

\k > to 2 and Afc+i < ta 2 , 

where t is a specified number greater than 1. According to the above selected value of k, the 
matrices A and U are partitioned as follows: 

A = A 0 k ° , U = [U fc ,U r ], (1.122) 

where is a kxk diagonal matrix, A r is a (N-k)x(N-k) diagonal matrix, is a Nxk matrix 
and U r is a Nx(N-k) matrix. In terms of the partitions of matrices A and U, the covariance 
matrix inverse R” 1 in (1.120) becomes 

R- 1 = cr — 2 [ijv - UfcA^Uf ] - (7 -2 U r A r U^, (1.123) 

where A& = I& and 

Afc + i 0 ... 0 

1 0 \k-i-'2 0 

A - — 

2* r — 9 

<J Z 

• • • 

0 0 ... A tv 

Now, in the principal component method R” 1 is approximated by the matrix R^ 1 , given by 

Rfc 1 = a” 2 [ijv-UfcAfeUf] +0(a- 4 ) 

= a- 2 [i w -UjkUf 

= a~ 2 P k , (1.125) 

where O(-) denotes, “order of”, A k = I k and P k = - U fc Uf J represents the projection 

operator of the subspace which is orthogonal to the principal noise component subspace with 
projection operator UfcUf. 


(1.124) 


Appendix A: Householder Transformation 

Let z = [z\ , Zh •••, Zn^ be a complex MN X 1 vector, rj = z^z, e x = [l,0,...,0] r and v = 
z _|_ jyeie je , where 9 = tan and z\ = xi + jyi. The Housholder transformation matrix H is 
given by 



2vv h 


(1.126) 
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Then H^H = I n and Hz = -rje x e? 9 . 

Proof 

The first equality is immediate. Next 


Hz = z — 


2vv H z 

yH y 


z — (z + rje ie^ e ) 


2(z + rje\ei e ) H z 


— z — (z + rje ie^ 6 ) 


(z + rje iei e ) H (z + rje \e^ 6 ) 
2{rj 2 + rjei H ze~i e ) 


rj 2 + rje~i e e\ H z + rje+i e z H ei + rj 2 


= z - (z+,,ei ei> ) ^ltXi^-1)) 

/ , ji&\V + { x i cos0 + yi sin 6) + j(y 1 cos9 - x 1 sin9) 

— z - (z + rjeie J )-7 ---:—t:- 

v V + {x\ cos 9 + yi sin 9) 


— rje\e 


j{0+n) 


since y\ cos 9 — xi sin 9 — f iyi - z 1 / 1 2 = 0. QED 

V^i+^i V X i+V\ 


(1.127) 


Appendix B: Derivation of the pdf of 77 1 

The pdf of t ? _1 = zfzi — z^Z(Z H Z)~ 1 Z H z\ is derived in this appendix by a use of the Cochran’s 
theorem [37]. 

Cochran’s Theorem: Assume that x is a n X 1 zero mean, white Gaussian, random vector, 
i.e. N n ( 0,I n ) and C is a symmetric matrix. Then 

x H Cx = xl 

a 

if and only if 


C 2 — C and rank (C) = k . 

Next, rj~ l can be arranged to be in the quadratic form x^Cx as follows: 

t]- 1 = zf Zl -zfZ(Z ff Z) _1 Z H z 1 

= zf (i- Z(Z H Z)~ 1 Z H ) zi 

= zf (I - P) z x , (1.128) 

where 

P = Z(Z H Zy l Z H (1.129) 

is evidently a projection matrix with rank equal to N-l. Note also that (I — P) is also a projection 
matrix of rank ( K — N + 1). Since the entries of Z\ are i.i.d. N(0,1) and (I — P) 2 = (I — P), one 
has by Cochran’s theorem that 


zf(I-P) Zl = XK-N+V 


(1.130) 
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Appendix C: Derivation of the pdf of ( 

The pdf of C is derived in this appendix. One can also find a similar derivation in [5]. By (1.84) 

C= (l + zfZ(Z H Z) _2 Z H z 1 ) _1 . (1.131) 

Next let 

5 = z^Z(Z H Z)~ 2 Z H zi. (1.132) 

Since Z is positive definite, one can find the QR decomposition of Z by the Householder trans¬ 
formation such that 


Z = Q zRz > 


(1.133) 


where Q z is a K X (N-l) matrix with Qz — I/v-i and Rz is a (N-l) x (N-l) lower triangular 
matrix. The QR decomposition is performed by successively applying the Householder transfor¬ 
mation to each row of the matrix Q z- From the top row to the bottom row, one row at a time, 
one transforms the matrix Z. After the (N-l)-st transformation, one obtains 


ZQiQ 2 * • Qiv-i = Rz — 


Z K 


i 


0 

ZK—i 


\e j02 0 


0 

0 


ZK-N+2 


e J&K-N +2 


,(1.134) 


where Q; is the Householder transformation matrix with respect to the i th row of length K-i+1 
and ||zj|| represents the vector norm of length j. 

By a substitution of Z — QzRz into (1*132), S becomes 


s = zfQzRz(RfRz) 2 RfQfzx 


- zfR^Ri'zx 


(1.135) 


where zi = Qfzi. Next, apply the Householder transform by picking a unitary matrix Q, such 
that 


Q 9 Z! = y/EfZ 1 e N . 1 e^ ar9 ^ N - 1 '> . 


(1.136) 


Then, one obtains the following result: 


5 - 


zf 

zfzie^! (r^R^ 1 ) Zi 


-H-d- 1 


~ H ~ 

Zi Zi 


{ K z IiR z 1 )] N _ hN _ 1 


_ Xat-i 
d XK-N+2 


(1.137) 
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Finally by (1.134) one finds that R^R^ 1 is given by 

(1.138) 

Then by (1.131),(1.137) and the fact that if Y = xl and Z = xh are independent, then one 

d d 

obtains Y/(Y+Z) = /3(|n, |m), where (3(a, b) is a beta distributed variate. Hence the final result 

d 

is given by 

<= = P{K -N + 2, IV -1) . (1.139) 

1 + Od 

Appendix D: Verification of the Inversion Formula in (1.119) 

A -1 - A -1 B (DA -1 B + C -1 ) -1 DA -1 (A + BCD) 

= I - A _1 B (DA -1 B + C- 1 )' 1 D - A _1 B (DA^B + CT 1 )” 1 DA^BCD + A^BCD 
= I - A _1 B (DA -1 B + C -1 ) 1 C _1 CD + A _1 B (dA _1 B + C _1 ) * DA _1 BCD + A _1 BCD 
= I - A -1 B [(DA _1 B + C -1 ) -1 C -1 + (DA _1 B + C -1 ) 1 DA _1 B - I CD 

= I — A -1 B (DA^B + C' 1 )" 1 (DA-'B + C _1 ) - I CD = I QED 
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Figure 1.10: Fully adaptive STAR 
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Figure 1.11: Element-space, post-doppler STAP. 


36 



X A Vffi 

=E 

a 

£ 



C <D 

£ S 

a <d 
< 13 


Figure 1.12: Beam-space, post-doppler partial STAP. 
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Figure 3 System Block Diagram of Beam- Space Post- Doppler Partial STAP Estimator- Detector 






Detection probability versus SNR 



Figure 1.13: Probability of detection of fully adaptive STAP (M=10,N=10). 


Detection probability versus SNR 



Figure 1.14: Probability of detection of fully adaptive STAP (M=5,N=5). 
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Detection probability versus SNR 



Figure 1.15: Probability of detection of partially adaptive STAP (M=10,N=10). 



SNR(dB) 


Figure 1.16: Probability of detection of partially adaptive STAP (M=5,N=5). 
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Filter bank detection probability versus SNR 



Figure 1.17: Comparison of the probability of detection between fully and partially adaptive 
STAP (M=10,N=10) 


Filter bank detection probability versus SNR 



Figure 1.18: Comparison of the probability of detection between fully and partially adaptive 
STAP (M=5,N=5) 
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Part 2 


Robust Detection and Estimation 
with Alpha-Stable Distributions for 
STAP Applications 


Principal Investigator: Professor Chrysostomos L. Nikias 

Collaborators: Russell Lambert, Xinyu (Jack) Ma, Dr. Panagiotis Tsakalides, 
and Professor George A. Tsihrintzis 


Spikes due to clutter sources such as mountains, forests or ocean waves, and glints due to reflec¬ 
tions from large flat surfaces such as buildings or vehicles are usually present in radar returns. 
Their presence obscures the target detection capability of a radar system and degrades the sys¬ 
tem’s performance. Preprocessing to suppress the spikes/glints in the radar returns and minimize 
their effect on the radar target detection performance can be efficiently done only on the basis of 
appropriate statistical modeling. 

Very recently, a statistical model of impulsive interference has been proposed, which is based 
on the theory of symmetric alpha-stable (SaS) random processes [1]. The model is of a statistical- 
physical nature and has been shown to arise under very general assumptions and to describe a 
broad class of impulsive interference. In particular, it has been shown in [1] that the first order 
distribution of the amplitude of the radar return follows a SaS law, while the first-order joint 
distribution of the quadrature components of the envelope of the radar return follows an isotropic 
stable law. This model is mathematically more appealing than existing models for impulsive 
interference. 

In this part, we address the solution of the target detection and signal parameter estimation 
problems through the use of radar array sensor data retrieved in the presence of impulsive in¬ 
terference (noise, clutter, or jamming.) First, we describe new methods on the modeling of the 
amplitude statistics of airborne radar clutter by means of alpha-stable distributions. Then, we 
introduce target detection and joint angle and Doppler estimation techniques from radar measure¬ 
ments retrieved in the presence of impulsive noise modeled as a multivariate sub-Gaussian ran¬ 
dom process. The results are of great importance in the study of space-time adaptive processing 
(STAP) for airborne pulse Doppler radar arrays operating in impulsive interference environments. 
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This part includes the results in the area of alpha-stable modeling and is organized as follows: 

1 . Introduction of the statistical model, based on the class of symmetric cv-stable (SaS) dis¬ 
tributions [Section 2.1]; 

2 . Characterization and development of methods for parameter estimation of SaS distri¬ 
butions by means of order statistics, fractional lower-order and negative-order moments. 
Modeling of the amplitude statistics of radar clutter by means of SaS distributions and 
estimation of the parameters of the stable distributions from real clutter of the Mountain 
Top Database [Section 2.2]; 

3. Development of an adaptive matched-filter detector for the case of alpha-stable, sub- 
Gaussian noise. Comparison of the performance of the proposed detector to the that of 
the optimum detector under completely known signal and Gaussian noise characteristics 

[Section 2.3]; 

4. Development of the Cauchy Beamformer for joint spatial- and Doppler-frequency maximum 
likelihood estimation. The Cauchy Beamformer is shown to be very robust in a wide range 
of additive impulsive noise environments. [Section 2.4.1]; 

5. Development of a new joint spatial- and doppler-frequency high-resolution estimation tech¬ 
nique based on the eigendecomposition of the covariation matrix of the space-time radar 
measurements [Section 2.4.2]. 

2.1 Alpha-Stable Random Variables and Processes 

Gaussian distributions and processes have long been accepted as useful tools for stochastic mod¬ 
eling. In this section, we introduce a statistical model based on the class of symmetric a-stable 
(SaS) distributions which is well-suited for describing signals that are impulsive in nature. A 
review of the state of the art on stable processes from a statistical point of view is provided by 
a collection of papers edited by Cambanis, Samorodnitsky and Taqqu [2]. Several statisticians 
including Cambanis, Zolotarev, Weron, et al have published extensively on the theory and appli¬ 
cations of stable processes. They studied the properties of stable processes [3, 4, 5], their spectral 
representation [ 6 , 7], as well as prediction and linear filtering problems [ 8 , 9]. Textbooks in the 
area were written by Samorodnitsky and Taqqu [10] and by Janicki and Weron [ 11 ]. An extensive 
review of stable processes from a signal processing point of view can be found in a tutorial paper 
by Shao and Nikias [12] as well as in a monogram written by the same authors [1]. 

2.1.1 The Class of Real SaS Distributions 

The symmetric o-stable (SaS) distribution is best defined by its characteristic function 

<p{w) = exp (jSu - 7 M a ), ( 2 . 1 ) 

where a is the characteristic exponent restricted to the values 0 < a < 2 , 6 (—oo < S < oo) is 
the location parameter , and 7 (7 > 0) is the dispersion of the distribution. For values of a in the 
interval ( 1 , 2 ], the location parameter S corresponds to the mean of the SaS distribution, while for 
0 < a < 1, S corresponds to its median. The dispersion parameter 7 determines the spread of the 
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distribution around its location parameter S , similar to the variance of the Gaussian distribution. 
The characteristic exponent a is the most important parameter of the SaS distribution and it 
determines the shape of the distribution. 

A stable distribution is called standard if 8 = 0 and 7 — 1 . Clearly, if a random variable X 
is stable with parameters or, 7 , 5, then (X - 5)/y l ! a is standard with characteristic exponent a. 
The standard SaS density functions for a few values of the characteristic exponent a are shown 
in Figure 2.1. 

By letting a take the values 1 and 2 , we get two important special cases of SaS distributions, 
namely, the Cauchy (ce = 1 ), and the Gaussian (a ~ 2): 


Cauchy 


Gaussian 




1 7 


7T 7 2 + [x — 6) 


( 2 . 2 ) 


r * 1 r 

/ 2 ( 7 »*;a)= - 7 =exp [-— 7 — J. 
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(2.3) 


Unfortunately, no closed form expressions exist for general SaS distributions other than the 
Cauchy and the Gaussian. However, power series expansions can be derived for f a (y,S]x) [12]. 
Although the SaS density behaves approximately like a Gaussian density near the origin, its 
tails decay at a lower rate than the Gaussian density tails. While the Gaussian density has 
exponential tails, the stable densities have algebraic tails (cf. Figure 2.2). The smaller the 
characteristic exponent a is, the heavier the tails of the SaS density . This implies that random 
variables following SaS distributions with small characteristic exponents are highly impulsive. It 
is this heavy-tail characteristic that makes the SaS densities appropriate for modeling signals 
and noise or interference which are impulsive in nature. 

SaS densities obey two important properties which further justify their role in data modeling: 

• The stability property , which states that the random variables X \,..., X n are independent 
and symmetrically stable with the same characteristic exponent a if and only if for any 
constants ai,..., a n , the linear combination Ya= 1 a iX% 1S a l so SaS] 


• The generalized central limit theorem , which states that the family of stable distributions 
contains all limiting distributions of sums of i.i.d. random variables. 

An important difference between the Gaussian and the other distributions of the SaS family 
is that only moments of order less than a exist for the non-Gaussian SaS family members. The 
fractional lower order moments (FLOM’s) of a SaS random variable with zero location parameter 
and dispersion 7 are given by: 

E\X\ P = C(p, a) 7 a for 0 < p < a (2.4) 


where 

C(p, a) 


and r(-) is the Gamma function. 


2 p +1 r(2±i)r(-f) 


<V5rT(-§) 
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2.1.2 Symmetric Sub-Gaussian Alpha-Stable Processes 

A collection of random variables (A(f), t € T} where T is an arbitrary index set, is said to 
be a SaS stochastic process if for all combinations of distinct indices € T, the ran¬ 

dom variables A(U ),.. ,,X(t n ) are jointly SaS with the same characteristic exponent a. The 
stochastic process {X(t), t € T) is stationary if the random vectors (X(ti ),..., X(t n )) and 
(A(U + 5 ),..., X ( t n + s )) are identically distributed for each choice of s,ti, • •.» t n € T. The fam¬ 
ily of stable processes has many members with mutually exclusive properties. In the following, 
we present the important class of sub-Gaussian stable processes which we will use extensively. 


Sub-Gaussian Processes 


A stable process (A(i), t £ T} is said to be an a-sub-Gaussian process (a-SG(£)) if for distinct 

indices t u ■ ■ .,t n € T, the random vector A = [A(£i), . ..,X(t n )] T has characteristic function of 

the general form 

0(u>) = expf-^w 7 ’^)"/ 2 ], (2.6) 

r l r P 

where R is a positive-definite matrix, called the underlying matrix of the process, u — [o>i,..., u > n J , 
and a Takes values in ( 1 , 2 ]. When 0 = 2 , A is a Gaussian vector with zero mean and covari¬ 
ance matrix R. Unfortunately, closed-form expressions for the joint pdf of sub-Gaussian random 
vectors are known only for the Gaussian (o = 2 ) and Cauchy(<a = 1) cases: 


fa (20 


fc(X) 


exp(—A r fi 1 A) 

ciiiir 1/2 

[1 + A T ^ _1 A] (L+1) / 2 


(Gaussian) 
(Cauchy), 


(2.7) 

( 2 . 8 ) 


where L is the length of the random vector, ] |_£| | is the determinant of^, and c = ^71772 r( : 2 ')- 
Sub-Gaussian processes share many common features with Gaussian processes. In fact, 
sub-Gaussian processes are variance mixtures of Gaussian processes [13]. Specifically, any sub- 
Gaussian random vector can be expressed in the form 


A = w^G, 



where w is a positive f-stable random variable [14] and G is a Gaussian random vector of mean 
zero and covariance matrix R. An important distinction between Gaussian and sub-Gaussian 
processes is that, while linear spaces of Gaussian random variables may contain non-degenerate 
independent elements, sub-Gaussian random variables cannot be independent [15]. 

Sub-Gaussian ScrS processes combine the capability to model statistical dependence with the 
capability to model the presence of outliers in observed time series of various degrees of severity. 
The example in Figure 2.3 is indicative of the concept. Consider a sub-Gaussian vector of length 
L = 100 and diagonal underlying covariance matrix R — diag {1,1,..., 1}. Typical realizations 
of the vector are shown for characteristic exponents a = 2 and a — 1.5. Clearly, it is difficult 
to distinguish one vector from the other visually. However, if we look over 1000 independent 
realizations of the first component of the vector, a clear difference is observed. 

In modeling the signals and/or noise for the parameter estimation problem, we need a complex 
model for the noise samples. We also need to define quantities which describe correlations between 
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random variables. In the following section, we present the family of isotropic complex SaS 
distributions and describe their correlation properties by means of the covariation quantity. 


2.1,3 Complex SaS Random Variables and Covariations 

A complex random variable (r.v.) X = X\ + jX 2 is symmetric o-stable (SaS) if X\ and X 2 are 
jointly SaS and its characteristic function is written as 



= £^exp[j3?(a;V*)] = Eexp[](iOxXi + lo 2 X 2 )] 
= exp |wi&i + U) 2 x 2 \ a dr Xu x 2 ( x u x 2 )^ , 


( 2 . 10 ) 


where u> = uq +ju 2j §R[*] is the real part operator, and Tx lt x 2 ls a symmetric measure on the unit 
sphere S 2 , called the spectral measure of the random variable X . A complex random variable 
X = X\ + jX 2 is isotropic if and only if (X\,X 2 ) has a uniform spectral measure. In this case, 
the characteristic function of X can be written as 


cp(u) = E exp(ffi[uX*}) = exp(-~ 7 |u;| a ), 


( 2 . 11 ) 


where 7 (7 > 0 ) is the dispersion of the distribution. 

In the theory of second-order processes, the concept of covariance plays an important role in 
problems of linear prediction, filtering and smoothing. Since SaS processes do not possess finite 
jpth order moments for p > a, covariances do not exist on the space of SaS random variables. 
Instead, a quantity called covariation plays an analogous role for statistical signal processing 
problems involving SaS processes to the role played by covariance in the case of second-order 
processes. 

Several complex r.v.’s are jointly SaS if their real and imaginary parts are jointly SaS. When 
X = Xi + jX 2 and Y = Yj + jY 2 are jointly SaS with 1 < a < 2 , the covariation of X and Y is 
defined by 


[X,Y] a = / (®i +jx 2 )(yi + jy 2 ) <a l> dTx u x 2 ,Y 1 ,Y 2 {xi,X‘ 1 ,yi,y 2 ), 
Js 4 


( 2 . 12 ) 


where we use throughout the convention 


y</3> _ |y|/?-ly : 


(2.13) 


It can be shown that for every 1 < p < a, the covariation can be expressed as a function of 
moments [ 8 ] 

/?yy<P“i> 

[X,Y] a = .7 Y, (2-14) 


E\Y\ 


P 


where 7 y is the dispersion of the r.v. Y given by 

jh = 

Y C(P, a) 


for 0 < p < a, 


(2.15) 


with 


C(p, a) = 


2 p+ i r (s±2)r(-§) 

«r(i)r(-f) 


(2.16) 
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Obviously, from (2.14) it holds that 

[X,X] a = y x . 

Also, the covariation coefficient of X and Y is defined by 


(2.17) 


^X,Y = 


[.x , y] tt 

p'.n * 1 


(2.18) 


and by using (2.14), it can be expressed as 


EXY <p ~ l> 

*x,y = g|y|p . - for 1 < p < a. 


(2.19) 


The covariation of complex jointly SaS r.v.’s is not generally symmetric and has the following 
properties [3]: 


PI If X u X 2 and Y are jointly SaS, then 


[aX x + bX 2 , Y] a = a[X u Y] a + b[X 2 , Y] a (2.20) 

for any complex constants a and b. 

P 2 If Y\ and Y 2 are independent and X\, X 2 and Y are jointly SaS , then 

[aX l ,bY 1 + cY 2 ] a = ab <a - 1> [X 1 ,Y l ] Q + ac <a - 1> [X 1 ,Y 2 ] a (2.21) 

for any complex constants a, b and c. 

P 3 If X and Y are independent SaS , then [X,Y] a = 0 

P 4 Let {{/,•; i = 1 be independent complex SaS r.v.’s with dispersions 7 p For any 
complex numbers {a,-, i = 1 ,..., n} form 

X = a\Ui + -h a n ll n , 


( 2 . 21 ) 


Then 


Y = b\Ui + • • • + b n U n . 

[X , X] a = 7i|ai| a + ■ • • + TnKr, 
[y,y] a = 7il^r + --- + 7n|&r l | a , 

[X, y] a = 7ini&f a_1> + • • • + 7„a„6< af_1> 


( 2 . 22 ) 


2.1.4 Generation of Complex Isotropic Sa# Random Variables 

The generation of complex isotropic SaS deviates of characteristic exponent a is based on the 
following proposition found in [ 10 ]: 

Proposition 2.1 A complex SaS (a < 2) random variable X = X\ jX 2 is isotropic if and 
only if there exist two i.i.d. zero-mean Gaussian variables G\ and G 2 and a real stable random 
variable A of characteristic exponent a/2, dispersion cos 2 (7ra/4) and skewness /5 = 1 (we write 

A ~ S a / 2 {co$ 2 ( 7 ra/ 4 ), 1)), independent o/(G y i,G 2 ) such that (Xi,X 2 ) = (A 1 / 2 Gi, A l / 2 G 2 ). 
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We say that the vector (Xi, X 2 ) is sub-Gaussian with underlying vector (Gi, G 2 ). It can be shown 
that the real and imaginary parts of X are always dependent, unless G\ and G 2 are degenerate. 
Hence, every complex isotropic SaS random variable with a < 2 can be expressed as 

X = A l t 2 (G\ + . 7 G 2 ), (2.23) 


and its generation involves the generation of a real, totally skewed stable random variable. The 
problem of generating a real stable deviate is studied in [16] and [ 11 ]. Here, we present the result 

for easy reference. 

To generate a real standard stable random variable A ~ 5<*(1,/?) of characteristic exponent 
a, skewness /? and unit dispersion 7 = 1 , the following representations can be deduced: 


S(°'M = D«, e (cost/)I/ « 


sin a(U-U 0 ) fcos(U - a{U - Up)) 


1 —Q 

a 


W 


for O' 7 ^ 1 , 


(2.24) 


and 


S( 1 ,/?, 1) = f 


(- + /3U) tan U — p\n( 

2 


fWcosf/ 

% + pu 


(2.25) 


where W is standard exponential with Pr{W > w} = e~ w , w > 0, and U is uniform on (-§, f). 
Also, Da,0 = [cos(arctan(/ 3 tan( 7 ro/ 2 )))]- 1 / a , and U 0 = -%P[k{ot)/a] with k(a) = 1 - |1 - o|. 
Then, a stable variate, Ai, of dispersion 7 can be obtained from A by Ai = 7 1 /,a A. 


2.2 Parameter Estimation with Fractional Lower-Order Mo¬ 
ments 


The moment theory for SoS processes is based on fractional lower-order statistics. The fractional 
lower-order moment of a SaS random variable X satisfies: 


E(|X| P ) = Ci(p, a)j p ^ a , for - 1 < p < a, 


(2.26) 


where C x (p, a) = is the characteristic exponent (0 < a < 2), 7 is the dispersion 

and T(-) is the Gamma function. When X is an n dimensional spherically symmetric SaS random 
variable, E(|X| P ) is given in [5]: 


E(|XP) = 2 P 


r(^)T(i - f) , /a 

r (i - f)r(f) 7 ’ 



— n < p < a. 


(2.27) 


Specifically, when X is an isotropic bivariate SaS random variable, we have: 

E(|X| P ) = C 2 (p, a) 7 p/a , for - 2 < p < a, 


(2.28) 


where C 2 (p, a) = 2 P — The simplest yet most important class of complex SaS random 

A v- 1 2 / 

variables is the class of isotropic bivariate SaS . 

Notice that the value of p (the order of moment) can be both negative and positive. Using 
this property, we can estimate the parameters (a, 7 ) of SaS processes [17]. 
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Sine Function Estimator: 

Letting 0 < p < min (a, 1), such that both positive- and negative-order moments are finite, 
then: 

E(m»)E(|X|-») = (2.29) 

a sm{pir/a) 

i.e., a can be found by solving the following sine function equation: 


(2.29) 


sinc(—) = f n L a t = —— 7 » ® < P < m i n !)• 

« (-rfe) p*mmm- p v 


(2.30) 


The above equation does not involve 7 . Once a is estimated, 7 can obtained from (2.26): 


7 = 


E(jgm a/p 

Clip, a)) 


(2.31) 


log |SaS| Estimator: 

By rewriting E(|X| P ) as E(e plog l x l) and by defining the new random variable Y = log |X|,* 
we have that: 

E(|X| P ) = E(e plog|x| ) = E(e pF ), -1 < p < a, (2.32) 

where E(e py ) is the moment-generating function of Y. By expanding E(e py ) into power 


series: 


k=0 




(2.33) 


and by using with (2.26), we can see that the moments of Y of any order must be finite 
and they satisfy: 

E ( Yk ) = -fk{c i(P.«)7 p/a )l P =o- (2-34) 


Simplifying the above equation, we have [18]: 


E(F) = C e (i - 1) + — log 7 , 

o a 


(2.35) 


where C e = 0.57721566 • • • is the Euler constant, a is the characteristic exponent, 7 is the 
dispersion, and 

Var(Y) = E{(Y - E{Y}) 2 } = 4 (4 + z) > ( 2 -36) 


Var(Y) = E{(Y - E{Y}) 2 } = y + 1) , 
E{(Y — E{Y}) 3 } = 2((3) (L-l), 


(2.37) 


where £(•) is the Riemann Zeta function, and £(3) is a constant: £(3) = 1.2020569 •• •. 
The higher-order moments of Y always exist and starting from the second-order moment, 
the equations only involve a. This property provides a simple algorithm to estimate the 
parameters (<*, 7 ) of a SaS random process. Since we can estimate the mean and variance 
of y by: 

e£i y < -2 eLiu -?) 2 

r = N • °Y- -jvn-’ t 2 - 38 ' 


(2.38) 


Note that log |X| is bounded because the p.d.f. of X f(x) is bounded at x = 0. 
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Table 2 . 1 : Performance comparison of the log |SaS| Estimator versus the Sine Function Estimator. 
The sample size is 5000, the true values are a = 1.5 and 7=1. 


Estimation Method 

a 

a 

A . 

7 

log|SofS| Estimator 

—| 

0.9989 

(0.0385) 

Sine Function Estimator 

IfiliWi 

1.0023 

(0.0423) 


where N is the number of i.i.d. samples of {F;}, by solving (2.36) we can obtain an 
estimate of a and substitute into (2.35) for an estimate of 7 . The log |ScrS| process was 
first introduced by Zolotarev [5]. 

Table 2.1 shows the computer simulation results for the above two estimators. As we can see, both 
estimators have mean value close to the true one and low variances (numbers in parentheses.) 

2 .2.1 Mountaintop Data Analysis: Clutter Modeling Using SaS Theory 

The data files used in our experiments are from the Mountaintop Data Package tape. The clutter 
data set contains measured monostatic clutter using the Radar Surveillance Technology Exper¬ 
imental Radar (RSTER). The data consist of four files: cm435al.bfr, cm435bl.bfr, cm435cl.bfr, 
and ncal435a.bfr. These data files were collected on August 27, 1993 as part of the Clutter Map 
Multi-Frequency Experiment. The data are unequalized I/Q data and the waveform employed 
was a 5 fiSec pulsed CW signal at 435 MHz. file cm 435 al covers from 3.5 to 43.5 nmi in range. 
File cm435bl covers from 40 to 80 nmi in range. File cm435cl covers from 75 to 115 nmi in range. 
Each range has a sample size of 495. All of these files have enough CPIs to cover 360 degrees in 
azimuth. The file ncal435a contains the system noise. In our comprehensive study, we conduct 
our experiments in 24 different azimuth angles with 15 degrees interval. At each azimuth angle 
of interest, we concatenate the data from the three ranges to form a single long stream (1485 
samples). Experiments with the actually raw data are performed with the I/Q clutter data on 
Sensor # 1 (There are 14 sensors in the linear array). For the SaS modeling, the parameters 
(characteristic exponent a and dispersion 7 ) are first estimated using fractional lower order mo¬ 
ments, then estimated by amplitude probability density (APD) curve fitting using exhaustive 
search. Noting that these parameters should remain the same for both the I- and Q-components 
of the clutter data, in the analysis of APD we focus on the I-component for simplicity. In the 
Gaussian model, the variance is estimated by taking standard deviation of the data. The samples 
are assumed to be i.i.d. and for convenience,the data have been scaled by 10 5 . In Figures 2.4 to 
2.17, we show the I/Q data time series, estimated parameters, and APD comparison along several 
azimuth angle values at 15, 45, 60, 105, 120, 240, and 315 degrees. The experiments demonstrate 
that the SaS distribution is superior to the traditional Gaussian distribution for modeling the 
actual radar clutter data. 
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2.3 Signal Detection in Sub-Gaussian Impulsive Interference 

In this section, we address the problem of coherent detection of a signal embedded in heavy¬ 
tailed noise modeled as a sub-Gaussian, alpha-stable process. We assume that the signal is a 
complex-valued vector of length L ) known only within a multiplicative constant. The dependence 
structure of the noise, i.e., the underlying matrix of the sub-Gaussian process, is not known. The 
intent is to implement a generalized likelihood ratio detector which employs robust estimates 
of the unknown noise underlying matrix and the unknown signal strength. The performance 
of the proposed adaptive detector is compared to that of an adaptive matched filter that uses 
Gaussian estimates of the noise underlying matrix and the signal strength and is found to be 
clearly superior. The proposed new algorithms are theoretically analyzed and illustrated in a 
Monte-Carlo simulation. 

2.3.1 Estimation of the Underlying Matrix of a Sub-Gaussian Vector 

Before we attack the signal detection problem in sub-Gaussian interference, we present some 
necessary theory in the estimation of the underlying matrix of a sub-Gaussian vector. The 
following proposition found in [10, pp. 89] expresses the underlying matrix of a sub-Gaussian 
vector in terms of its covariation matrix and can, therefore, be used to obtain high quality 
estimates of the underlying matrix of the vector from independent observations. 

Proposition 2.2 Let X = [X\, X 2 , ..., Xl] t be a sub-Gaussian random vector with underlying 
covariance function R (cf. (2.6)). Then, its covariation matrix Q_ will consist of the elements 

Cij = [Xu Xj\ Q = 2 -iRijRjf (2.39) 

The usefulness of the proposition lies in finding consistent estimators of the underlying co- 
variance matrix of a sub-Gaussian vector from independent realizations X }, Xf ,..., X K of the 
vector. The elements C ZJ of the covariation matrix Q_ can be estimated by combining moment 
approximations of the expressions (2.14), (2.15), and (2.16) as follows 

Cij = C(p , a)[L J2 X*(X*)<>- x >rl>[L £ |XflT /p_1 , (2.40) 

k —1 k ~1 


where any p < ~ will result in a consistent estimate^ [19], as shown in the following proposition. 

Proposition 2.3 The estimator C\j of the covariation matrix elements given in (2.f0), where 
p < a/2, is consistent and asymptotically normal with mean Cij and covariance £{(Cij — 
Cij)(C\ m — Cim)*} as in Appendix A. 

Proof See Appendix A. | 

Equation (2.39) can now be used to compute an estimate of the underlying matrix R from 
the estimate of the covariation matrix C_. 

*We have empirically found that a good choice is p = j. 
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Proposition 2.4 Let Cij be the estimator of the covariation matrix elements given in (2.40), 
where p < a/2. The estimates 



A 



Q A A 

2 zCij/R 


a—2 


n 


(2.41) 

(2.42) 


of the elements of the underlying covariance matrix are consistent and asymptotically normal 
with means R~j and Rij, respectively, and variances as in Appendix B. 


Proof See Appendix B. | 

The procedure is illustrated with the following simulation study: Consider a sub-Gaussian 
random vector of length L = 64 and underlying matrix Q. = diag {1,1,..., 1}. We assume that 
K = 500 independent realizations of the vector are available and plot the Z2 nd row of the mean 
over 100 Monte-Carlo simulations of the following two estimates: 


k = fy^ X k X k T (2.43) 

K k= 1 

R = as obtained from covariation matrix estimates (2.42). (2.44) 

We examined the cases of a = 2 and a = 1.5. Clearly, the Gaussian estimate fails when a = 2, 
while the covariation-based estimate maintains high performance in both the cases of a = 2 and 
a — 1.5. 

Next, we consider the estimation of the amplitude of a signal of known shape embedded 
in sub-Gaussian noise from a number of independent observations. The following Proposition 
outlines the procedure and states its performance. 


Proposition 2.5 Consider the collection of K vectors X k = As 4 - N k , k — 1,2,..., A, where 
>T n ~ 1 . Form the least-squares estimates Ak = s T X k = s T As + s T N k = A + s? N_ k , k — 


s s 


1 , 2 ,..., K. Define A = sm {Ax, A 2} ..., Ak}, where sra{-**} indicates the sample median 
of its arguments. The estimate A is consistent and asymptotically normal with mean equal to 

the true signal amplitude A and variance a) } 2 > w h ere T = 2 «(5Z*=i s i s jRij ) 2 = 

2~ a (s T Rs) f . 


Proof The random variables Ak, k = 1,2,..., A, will be independent, each of pdf /(#), which 
can be computed as follows. We begin with 

A k = A + s T N k = A + w2s T G k , 


where G k , k = 1 , 2 ,...,A, are independent Gaussian random vectors, each of mean zero and 
covariance matrix R. Therefore, s?G k , k = 1 , 2 ,..., A, are independent Gaussian random vari¬ 
ables of mean zero and variance s T Rs % which implies that s T N _ k , k = 1 , 2 ,A, are indepen¬ 
dent sub-Gaussian random variables of length L — 1 and dispersion 7 = 2 “« (s T As)^. Thus, 
f(%) = fail, A] x). 
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From [20, p. 369], it follows that the sample median of A k , k = 1,2,..., I<, is asymptotically 
(for K —)■ oo) normal with mean equal to the true median (A) and variance 

fail, 5; x) = T l/a f a [ 1,0; (x - 5)i~^ a ] and / a (l,0;0) = ^T(£) [21], Combining the last two 
relations, we get 

1 r 1 ,2 = 1 r 2 

K zf a {i, A; A) A' L 2r(l/«) J 

A 

as the asymptotic variance of the estimator A. | 


2.3.2 Data-Adaptive Algorithms for Coherent Signal Detection 

We consider the hypothesis testing problem 

Ho : X k = N k 

k = 1,2, ...,/<■ 

#i : X = S + N k , 

where all the vectors have dimension (length) L and k = 1, 2,..., K indexes independent, iden¬ 
tically distributed realizations. 

We make the following assumptions: 

1. The noise vectors N k have a sub-Gaussian distribution, i.e., 


N k = w\G k , 

where Wk is a positive (a/2)-stable random variable of unit dispersion, G is a Gaussian 
random vector of covariance matrix and w and G & re independent. 

2. The signal vector S = As consists of a known shape s (for which s T s = 1) and an unknown 
amplitude A. 

The proposed test statistic is a generalized likelihood ratio test that makes use of the multi¬ 
dimensional Cauchy pdf defined in (2.8): 


r 1 + x T R x a 

tc = V log[ - : -Tr i-r~J 

frx 1 + (X - As) T R (X - As) 


(2.45) 


For the estimates R and A, we choose the estimates proposed in (2.42) and Proposition 2.5, 
respectively. Assuming Gaussian noise of unknown covariance matrix R and unknown signal 
amplitude, the data-adaptive detector attains the form of an adaptive matched-filter, i.e., it 
computes the test statistic 


t G = (2/A) E(^) t E A - |A|VE 5, 


? —1 


(2.46) 


k= l 


where A = (1/A) £fc=i 


A r BT l Kf 

s T Br l s 


and k = (1/A) Ek=iiX - Is) (A - ls) T . 
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2.3.3 Performance Analysis and Computer Illustration 
A. Theoretical Performance Analysis 

The theoretical performance of the described detection algorithm is analyzed by means of the 
following propositions. 


Proposition 2.6 Assume that the signal amplitude A and the noise underlying matrix R were 
known, while the test statistic 


* 1 + XjR~ l X 

tc ' l0g[ 1 + (X - As) T R~ 1 (X - As) 


(2.47) 


were used for detection purposes. The asymptotic (for K oo) receiver operating characteristic 

would then he _ 

_ 1 f erfc- 1 (2 P c fa ) + m Ho -m H 

Pd = o erfc [~--J> ( 2 - 48 ) 

2 V 2 < 

where Pf is the probability of detection induced by the test statistic t'f. when operating at a 
probability of false alarm Pf a . In (248), we have made use of the notation m# 0 = £{t c c \Ho] < oo, 
m Hl = £{t c c \Hi} < oo, o 2 Ho = var {t c c \H 0 } < oo, o 2 Hi = var {t c c \Hi) < oo. 


Proof From the tail behavior of the sub-Gaussian distributions, we can show that the above- 
defined m# 0 , m//,, , crjv are finite. Therefore, as K H- oo, the test statistic t c c approaches 

a Gaussian random variable under either hypothesis. From this observation, the asymptotic 
performance follows. | 

Proposition 2.7 Let t c c be as in Proposition 2.6, (cf. (2.47)). The asymptotic (as I< oo) 
performance of the test statistic tc in (2-45) is the same as the asymptotic performance of the 
test statistic t c c that requires knowledge of the signal amplitude and the noise underlying matrix. 

Proof From Propositions 2.4 and 2.5, R and A are consistent estimators of R and A , respectively, 
as I< ->■ oo. Slutsky’s theorem [22, p. 461] guarantees that the asymptotic distribution of tc 
will be the same as the asymptotic distribution of t c c and subsequently tc will result in the same 
asymptotic receiver operating characteristic as t c c in Proposition 2.5. | 


B. Computer Illustration of the Detector Performance 

The small sample performance of both the Gaussian and the proposed Cauchy detectors can be 
accurately assessed only via Monte-Carlo simulation. To this end, we chose an observation vector 
of length L = 8 and I< = 10 independent copies of it, while for the signal we chose a shape of 
a square pulse of height 1/VL and an amplitude of A = 1. The sub-Gaussian interference was 
assumed to be of characteristic exponent a = 2,1.75,1.5,1.25,1, and 0.75, and underlying matrix 
R = diag {1,1,..., 1}. The performance of the Gaussian and the Cauchy detectors was assessed 
via 10,000 Monte-Carlo runs. 

In Figure 2.19, we compare the performance of the Gaussian and the Cauchy detectors for 
different values of the characteristic exponent a. We see that, for a = 2, the Gaussian detector, 
as expected, outperforms the Cauchy detector; however, for all other values of a, the Cauchy 
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detector maintains a high performance level, while the performance of the Gaussian detector 
deteriorates down to unacceptably low levels. In Figure 2.20, we show the performance of the 
Gaussian and the Cauchy detectors for different values of the characteristic exponent a. 


2.3.4 The Special Case of a Single Observation 


Consider the data model 


X = As + N_, 


(2.49) 


where A is an unknown signal amplitude, s is a known signal shape, and jV is a sub-Gaussian 
random noise vector. Let us for the moment assume that the underlying matrix of the distribution 
is known. Several immediate observations become apparent: The maximum likelihood estimate 

A 

A m i satisfies 

A ml = argma xJ(X-as), (2.50) 

where /(*) is the known sub-Gaussian distribution. We can make use of the representation (2.9) 
and proceed to write 

A m i = argmax a / f l (u)f 2 {X_ - as|u) du, (2.51) 

Jo 

where /i(-) is the distribution of the square root of a positive, ^-stable random variable and 
f 2 (X- as|u) = /g( ™ ~ ) with /g(-) the multivariate Gaussian distribution of covariance matrix 
R. Clearly: 

A m i = argmax a fa ( — — —), (2.52) 


u 


i.e., 


Anil 


?R~ l X 


s T R 


-l 


(2.53) 


A 

We have, in other words, shown that the maximum likelihood estimate A m i is the least-squares 
estimate, as in the Gaussian case; however, in the sub-Gaussian case, the estimate is not efficient 
and, in fact, contains infinite error variance. As a consequence, the estimate A m i is not consistent 
and the performance of the test statistic tc in which this estimate is used will not approximate 
the performance of the test statistic t c c in which exact knowledge of A (and R) is assumed. 


2.3.5 Summary and Conclusions 

In this section, we addressed the problem of detection of a signal, known within a multiplicative 
constant, in sub-Gaussian impulsive interference of unknown underlying matrix. We defined the 
sub-Gaussian class of random processes, summarized several key results, and presented data- 
adaptive algorithms for coherent detection of a known (within a multiplicative constant) signal. 
Our derivations were based on a solid mathematical basis. 

From this study, we found that the Gaussian detectors for the same problem deteriorate 
in performance when required to operate in sub-Gaussian interference. On the other hand, a 
detector based on the multidimensional Cauchy distribution exhibited resistance to the presence 
of the sub-Gaussian interference and high performance, comparable to the performance of the 
Gaussian detector in Gaussian interference. 
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2.4 Target Angle and Doppler Estimation for Airborne Radar 
in Impulsive Interference Modeled as a Stable Process 

In this section we develop target angle and Doppler, maximum likelihood-based estimation tech¬ 
niques from radar measurements retrieved in the presence of impulsive noise modeled as a multi¬ 
variate SaS random process. We derive the Cramer-Rao bounds for the additive SaS interference 
scenario to assess the best-case estimation accuracy which can be achieved. In addition, we in¬ 
troduce a new joint spatial- and doppler-frequency high-resolution estimation technique based 
on the fractional lower-order statistics of the measurements of a radar array. We define the co¬ 
variation matrix of the space-time radar observation vector process and employ subspace-based 
estimation techniques to the sample covariation matrix resulting in improved target-angle and 
Doppler estimates in the presence of impulsive interference. The results are of great importance 
in the study of space-time adaptive processing (STAP) for airborne pulse Doppler radar arrays 
operating in impulsive interference environments. 

2.4*1 Maximum Likelihood Estimation in Alpha-Stable Noise: The Cauchy 
Beamformer 

In this section, we develop the Maximum Likelihood (ML) estimator of the location and Doppler 
frequency of a target in the presence of noise modeled as a complex isotropic a-stable process. 
Initially, we concentrate on the additive complex Cauchy noise case. There are two reasons for 
doing this: First, the Cauchy distribution has a closed-form expression for its density function. 
This results in a straight-forward implementation of the maximum likelihood estimation, with 
closed form expressions for the Cramer-Rao bound. Secondly, it is shown through simulations 
that the Cauchy beamformer is very robust in different impulsive noise environments, i.e., its 
performance does not change significantly when the parameter a of the SaS noise varies in the 
interval [1,2]. 

A. Single Target Response Model 

Consider a uniformly spaced linear array radar antenna consisting of N elements, which transmits 
a coherent burst of M pulses at a constant pulse repetition frequency (PRF) f r and over a certain 
range of directions of interest. The pulse repetition interval is T r ( f r ~ l/T r .) A space-time 
snapshot refers to the MJV x 1 vector of samples corresponding to a single range gate. Given a 
target at angle (f) and Doppler frequency /, the space-time snapshot can be written as [23] 

x = /?v(</>, /) + n, (2.54) 

where (3 is the target’s complex amplitude given by 

/3 = x + jy. (2.55) 

The vector v is an NM x 1 vector called the space-time steering vector. It may be expressed as 

v(tf,/) = b(/)®a(0 (2.56) 

where a (<j)) is the N x 1 spatial steering vector containing the interelement phase shifts for a 
target at 4>, and b(f) is the M X 1 temporal steering vector that contains the interpulse phase 
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shifts for a target with Doppler /. It is assumed that the functional form of is known. 

For a linear radar array whose elements are spaced a half-wavelength apart, the following two 
vectors are defined: 

• Spatial steering vector: a(^) = [1, e^,..., e J ( N ~ l ^] T , where ip = cos(<^>) is the normal¬ 

ized spatial frequency of the target, and 

• Temporal steering vector: b(w) = [l.e^,..., w j iere u _ 2*1 \ s the normalized 

Doppler frequency of the target. 


The snapshot contains a noise component n which includes clutter, jamming, thermal noise, 
and any other undesired signals. We model n as a multivariate Cauchy process with pdf given 
by (cf. ( 2 . 8 )) 

„ , _ m\~ m In 57 , 

/(n)_ [l + n’’S- 1 n](* f ' v +»/ 2 ’ ' 

where R is a positive-definite matrix which models the statistical dependence of the impulsive 
noise process, and c = v{M n+i )/2 T ( m ^ +1 ). As a first approximation to the problem, we will 
assume that the noise present at the array is statistically independent both along the array 
sensors and along time. In this case, each component of the noise vector is modeled as a complex 
isotropic Cauchy process with marginal pdf given by 


Xy(r) = 


(2.58) 


27r(r2 + 7 2)3/2’ 

where 7 is the noise dispersion. Under the independence assumption, it follows from (2.54) and 
(2.58) that the joint density function for the case of a single snapshot is given by [24] 


MN 


MN 


/(») = n 


1=1 


(2n) MN ( 7 2 + \x. 


)3/2 ’ 


(2.59) 


where Vi(-if},Lo) is the ith component of the space-time steering vector v(V’,w) 

The estimation problem involves four real valued parameters which we can arrange to form a 
4x1 parameter vector 

© = [0j 0 2 #3 ^ 4 ] = [V> w x y]- (2.60) 

Given a single snapshot x and ignoring the constant terms, the likelihood function L(Q), is given 

by 

O MN 

L(&) = --£ log ( 7 2 + | Xi - (3vi(®)\ 2 ) (2.61) 

i =1 


B. Cramer-Rao Bound Analysis 

A 

The Cramer-Rao bound (CRB) for the error variance of an unbiased estimator © is given by 

£{(©-©)( 0 - 0 ) T } > J -1 (©), (2-62) 

where J (0) is the Fisher information matrix given by 

J(0) = E{[dL(®)/d®][dL(G)/d®] T }. (2.63) 

Under the assumptions stated in Section 2.4.1, and for the case of complex isotropic Cauchy 
noise, the following theorem holds: 
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Theorem 2.1 The CRB on the error variance of the target angle and Doppler frequency f 
estimates is given by 


rRRU , t A2 5AT(|1 d 6 || 2 -<5 2 /M) 1 

W - |0|2 (2^)2 • 3£ ' cos 2 (<£) ’ 


(2.64) 


and 


where 


and 


CRB(f) 


7 


2 1 


|/3| 2 (27rT r ) 2 


5M(|| d° || 2 -81/N) 


N 


d a = da/d<t>, 


ia 


! = Yj\ da i/ d< t>?, 

i= 1 


M 


d 6 = db/df, || d 6 1| 2 = \dbi/doj\ 2 , 


i= 1 


<5 a = 


iV M MN 

I, «6 = £kl. p = £K(oII4(oI* 


i=l 


i=i 


^ = (M || d a 


2 -^<5 2 )(7V || d 6 


2=1 

2 -£sl) -(SA-p) 


(2.65) 


Proof See Appendix C. | 

We should note that the above bounds can be achieved only when there exist unbiased esti¬ 
mators of all the model parameters. A useful insight on the CRB can be gained if we consider 
the case of a single source impinging in a linear array whose sensors are spaced a half-wavelength 
apart. In this case, 


CRB{cj>) = 


T 2 A 2 20 

|/?| 2 (27rd) 2 * M 2 N 2 (N 2 — 1) 


1 

cos 2 (^>) ’ 


( 2 . 66 ) 


and 


CRB(f) = 


7 2 1 

W (2ttT r ) 2 


20 

N 2 M 2 (M 2 — 1) ’ 


(2.67) 


The term 7 2 /|/?| 2 in the above expressions for the CRB can be viewed as the inverse of a quantity 
analogous to the signal-to-noise ratio (SNR) for the Gaussian case, i.e., a generalized SNR, so to 
speak. The larger the dispersion 7 of the noise, the higher the CRB. 


C. A Performance Study 

To demonstrate the performance of the proposed method for the target parameter estimation 
problem, we conducted several simulation experiments where we compared the ML estimator 
based on the Cauchy noise assumption (MLC) with the ML estimator based on the Gaussian 
noise assumption (MLG), and with the MUSIC estimator. 

In all the experiments the array is linear with five sensors spaced a half-wavelength apart. A 
single signal impinges to the array from a stationary target located at a direction of 5°. At first, 
the signal is assumed to be known at the receiver and the ML method is applied to estimate the 
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Table 2.2: GSNR and average PSNR for different values of M. 



Number of snapshots, M 

M = 5 

M = 10 

M = 20 

0 

II 

M = 100 

GSNR [dB] 

22.6951 

22.7323 

22.7416 

22.5003 

22.6543 

PSNR [dB] 

7.3712 

3.3938 

0.5371 

-4.3158 

-7.149 


direction of arrival. Also, for the unknown signal case the pseudo ML method is applied by using 
a least-squares estimate for the signal. The noise is assumed to follow the bivariate isotropic 
stable distribution. 

In every experiment we perform 500 Monte-Carlo runs and compute the mean square error 
(MSE) of the direction-of arrival (DOA) estimates. The optimization for the MLC and MLG 
methods is performed by a steepest-descent algorithm with variable stepsize selected by means of 
Armijo’s rule [25]. Since the alpha-stable family for a < 2 determines processes with infinite vari¬ 
ance, we define two alternative signal-to-noise ratios (SNR’s). Namely, we define the Generalized 
SNR (GSNR) to be the ratio of the signal power over the noise dispersion 7 : 

, M 

GSNR= I01og(—^>(i)| 2 ). 

Also, for finite sample realizations, we define the Pseudo-SNR (PSNR) as: 

PSNR = 101og( ^ 1 | S ^|\ 

Et=iW*)r 

In this example we study the estimation accuracy of MLC, MLG and MUSIC as a function 
of three parameters, namely the number of snapshots M, the noise dispersion 7 , and the noise 
characteristic exponent a. 

Number of snapshots M. In the first experiment we study the influence of the number 
of snapshots M to the performance of the algorithms. The noise follows the complex isotropic 
Cauchy distribution with dispersion 7 = 1 . For this experiment, the GSNR is kept almost 
constant at 22.5 dB as shown on Table 2 . 2 . The PSNR is different in every Monte-Carlo run, so 
we calculate the average PSNR over the 500 Monte-Carlo runs (cf. Table 2.2). As the number of 
snapshots M increases, the PSNR decreases because more and more impulsive noise samples are 
incorporated into the data. 

Figure 2 . 21 (a) shows the resulting MSE of the estimated DOA as a function of the number of 
snapshots when the signal is known. The CRB is also plotted. As expected, the MLC estimator 
has the best performance since it is the optimal estimator for this type of noise. Also, the 
complete failure of the MLC and MUSIC processors for this type of impulsive noise is apparent. 
Figure 2 . 21 (b) shows similar plots for the case of the pseudo ML estimators where we use a LS 
estimate for the signal. The MLC estimator has again the least MSE. Comparing these curves 
with the analogous curves obtained assuming exact signal knowledge, we observe a larger MSE 
for the pseudo ML estimates, as expected. 

Noise dispersion 7 . In the second experiment we study the influence of the noise dispersion 
7, i.e., the influence of the GSNR to the performance of the methods. Here also, the noise follows 


(2.69) 


( 2 . 68 ) 
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Table 2.3: GSNR and average PSNR for different values of 7 . 



Noise Dispersion , 7 


7 = 0.5 

7=1 

7 = 2 

7 = 4 

7=6 

7 = 8 

II 

O 

GSNR [dB] 

25.7519 

22.7416 

19.7313 

16.7210 

14.9601 

13.7107 

12.7416 

PSNR [dB] 

W&MrkM 

0.5371 

-5.4835 

-11.5041 

-15.0259 

-17.5247 

-19.4629 


Table 2.4: GSNR and average PSNR for different values of a. 



Noise Characteristic Exponent, a 

a = 1.0 

a = 1.2 

a = 1.4 

to 

• 

r—1 

II 

3 

a = 1.8 

O 

CN 

II 

3 

GSNR [dB] 

22.7416 (7 = 1) 

PSNR [dB] 

0.5371 

6.2171 

10.1869 

13.1035 

15.3269 

20.0383 


the bivariate isotropic Cauchy distribution with dispersion 7 . The number of snapshots available 
to the algorithms is M = 20. The GSNR and average PSNR for this experiment are shown on 
Table 2.3. 

Figure 2.22 shows the resulting MSE of the estimated DOA as a function of the GSNR. Again 
the MLC estimate has the best performance. As evident in Figure 2 . 22 (b), the performance of 
the MLC using a LS estimate for the signal degrades more rapidly for large values of the noise 
dispersion, 7 (low GSNR values). 

Characteristic exponent ol. The importance of this experiment rests in its study of the 
robustness of the algorithms in different noise environments. Of course, by design, the MLG 
estimator is optimal for additive Gaussian noise (a = 2 ), and the introduced MLC estimator is 
optimal for additive Cauchy noise (a = 1 ). An important property of any processor is to be able 
to perform reasonably well in a wide range of noise environments (1 < a < 2). Here, we test 
the performance of the estimators when the characteristic exponent, a, of the noise stable law is 
changing. 

Figure 2.23 shows the resulting MSE curves as functions of the characteristic exponent a. The 
number of snapshots available to the algorithms is M — 20. The GSNR is 22.7416 dB (7 = 1) 
and is shown together with the average PSNR, on Table 2.4. The CRB as a function of a was 
computed as shown in [24]. 

As we can clearly see, the Cauchy beamformer is practically insensitive to the changes of a, 
and for exact signal knowledge it almost achieves the CRB for the whole range of values a. On the 
other hand, both the MLG and the MUSIC algorithms exhibit very large mean-square estimation 
errors for non-Gaussian noise environments. Note that when a = 2, i.e., for the Gaussian noise 
case, the MLG method has the least MSE, as expected. 

The experiment demonstrates that for values of a in [1,2], the ML method based on the 
Cauchy noise assumption exhibits performance very close to the optimum. This observation, 
combined with the fact that the ML method based on the Cauchy assumption has computational 
complexity similar to the ML method based on the Gaussian assumption, justify the importance 
of the Cauchy beamformer for the DOA estimation problem in practice. 
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D. Concluding Remarks 

We considered the problem of target-angle and Doppler estimation with an airborne radar em¬ 
ploying space-time adaptive processing. We derived Cramer-Rao bounds on angle and Doppler 
estimator accuracy for the case of additive multivariate Cauchy interference of known underlying 
matrix. The bounds are functions of a generalized SNR function, similarly to the Gaussian case 
where the bounds are functions of the SNR. The proposed method is based on the maximum 
likelihood estimation technique where the noise is modeled as a complex isotropic SaS process. 
The Cauchy beamformer has been shown to give better bearing estimates than the Gaussian 
beamformer in a wide range of impulsive noise environments, and for very low SNR values. 

2.4.2 Subspace-Based STAP Using Lower-Order Statistics 

In this section, we present a new subspace-based method for joint spatiab and doppler-frequency 
high-resolution estimation in the presence of impulsive noise which can be modeled as a complex 
symmetric alpha-stable (SaS) process. We define the covariation matrix of the array sensor out¬ 
puts and show that eigendecomposition-based methods, such as the MUSIC algorithm, can be 
applied to the sample covariation matrix to extract the doppler/angle information from the mea¬ 
surements. The improved performance of the proposed source localization method in the presence 
of a wide range of impulsive noise environments is demonstrated via Monte Carlo experiments. 

A. STAP Problem Formulation 

Consider a uniformly spaced linear array radar antenna consisting of N elements, which transmits 
a coherent burst of M pulses at a constant pulse repetition frequency (PRF) f r and over a certain 
range of directions of interest. The array receives narrow-band signal returns reflected by q moving 
targets which are located at azimuth angles {9k\ k = 1,.. . } q} and have relative velocities with 
respect to the radar {^; k = 1,.. q] corresponding to Doppler frequencies {/*; k = 1,..., q}. 
Since the signals are narrow-band, the propagation delay across the array is much smaller than the 
reciprocal of the signal bandwidth, and it follows that, by using a complex envelop representation, 
the array output can be expressed as [23]: 

x(t) = V(0, zu)s(t) + n(£), (2.70) 


where 

• x(t) = [xi (£),.. .,£mtv( 0] T the array output vector (TV: number of array elements, M: 
number of pulses, t may refer to the number of the coherent processing intervals (CPI’s) 
available at the receiver); 

• s (t) — [$i(£),..., s g (^)]- r is the signal vector emitted by the sources as received at the 
reference sensor 1 of the array; 

• V(0, zu) = [v(#i,a7i),..., v($ g , w q )} is the space-time steering matrix (zuk = j^); 

• Space-Time steering vector: v^k^k) = b (zuk) ®a(i9jt); 

— a( 7 ?fc) = [l,e j27r ^*,..., gXN-ipTi-tffcjT th e spatial steering vector ($k = ^-cos(^)); 

— b (wk) = [1, c j27rOTfc ,..., e 3( M ~ l ) 2 Kw k Y ; s the temporal steering vector. 
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• n(t) = [ni(^),.. n M N(t)] T is the noise vector. 

Assuming the availability of P CPI’s ii,..., tp , the data can be expressed as 


X = V(0,-^)S + N, 

(2.71) 

where X and N are the MN X P matrices 


X = [x(ti),...,x(4/>)J, 

(2.72) 

N = [n(ti),...,n(tp)], 

(2.73) 

and S is the q x P matrix 

S = [s(t 1 ),...,s(t P )}. 

Our objective is to jointly estimate the directions-of-arrival {#fc; k = 

(2.74) 

1 ,..., q] and the Doppler 


frequencies {/&; k = 1 ,..., q} of the source targets. 


B. The Array Covariation Matrix 

We will assume that the q signal waveforms are non-coherent, statistically independent, complex 
isotropic SaS (1 < a < 2) random processes with zero location parameter and covariation matrix 
r 5 = diag(7 Sl ,.. ., 7 Sq ). Also, the noise vector n(t) is a complex isotropic SaS random process 
with the same characteristic exponent a as the signals. The noise is assumed to be independent 
of the signals with covariation matrix IV = 7nl* 

Equation (2.70) can be written as 


x(t) = w (t) + n(i), 


(2.75) 


where w (t) = V(©,zv)s(t). By the stability property, it follows that w (t) is also a complex 
isotropic ScxS random vector with components 

Wi(t) — Vj(0, xu)s(t) = Vi($ 1 , +-h Vi{dq, Wq)s q (t) i = 1,•.., MN. (2.76) 

Also, it holds that w (t) is independent of n{t ). 

Now, we define the covariation matrix , IV 5 of the observation vector process x(t) as the 

matrix whose elements are the covariations \xi(t^Xj(f]\ a of the components of x(£). We have 
that 

[Xi(t),Xj{t)] a = [Wi(t) + Ui{t), Wj{t) + nj(t)]a 

= [wi(t), Wj(t)\ a + |Vi(£), nj{t)]a + [^*(0> “I" V Wl a * (2*77) 

By the independence assumption of w (t) and n (t) and by property P3 we have that 

[Wi{t),n j {t)\a = 0 , ( 2 . 78 ) 


and 


[rii(t),Wj(t)] Q = 0. 


(2.79) 
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Also, by using (2.76) and properties PI and P2 it follows that 


K(t)> Wj(t)]cr 


= [J2 &k)Sk{t),Wj(t)] Q 

k =1 
9 

= V i($k, &k)[Sk(t), Wj(t)] a 

k=1 

9 9 

= ^k^k)[Sk{t),^2vj(0l)si(t)] a 

k=l 1=1 

9 

= v 0k^k)^f a ~ l> {^k^k)^8 k y 

k= 1 


(2.80) 


where y Sk = [s*., Finally, due to the noise assumption made earlier, it holds that 


\p*i (0 j Hj (0]cv — Tfn&ifji (2.81) 

where Sij is the Kronecker delta function. Combining (2.77)-(2.81) we obtain the following 
expression for the covariations of the sensor measurements: 


9 

[xi(t),Xj(t)] a = Y Vii'&k, w A .)u j < a_1> (?? A .,CT A: )7 Si + 7 n 6 it j i,j = 1,.. MN. (2.82) 

k =1 

In addition, the dispersion and covariation coefficients of the array sensor measurements are given 
respectively by 

9 

1xj(t) = Y&k)\ a 7s k + In j = l,...,MN, (2.83) 

k= 1 

and 




ELl v i(0k, ™k)Vj ('Afc, k)ls k + T n&i,j 

ELi l u i(^,^)| a 7 Sfc + 7n 


z, j = 1,..MA 


(2.84) 


In matrix form, (2.82) gives the following expression for the covariation matrix of the observation 
vector: 

T x = [x(i),x(i)]« = V(0,^)r s V <a - 1> (0,w) + 7 n I, (2.85) 

where the (i,j)th element of matrix V <a_1> (0, zu) results from the (j, i)th element ofV(©,^) 
according to the operation 


[V<“-»(0, w)]y = [V(®, = | [V(®, w)]* r’ [V(0, =>)]*,- 


( 2 . 86 ) 


Clearly, when a = 2, i.e., for Gaussian distributed signals and noise, the expression for the 
covariation matrix is identical to the well-known expression for the covariance matrix: 

R* = V(0, zu)£V^(0, w) + cr 2 I, (2.87) 


where £ is the signal covariance matrix. 
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When the amplitude response of the sensors equals unity, it follows that 


[V<“- 1 >(0, w)]ij = [V(6>, w)]* jti , (2.88) 

and thus the covariation matrix can be written as 

T x = V(0, w)T s V h {0, w) + 7nl. (2.89) 

Also, from (2.83) and (2.84) the dispersion and covariation coefficients of the array sensor mea¬ 
surements can be written as 


7 x i (t) = '52'Ysk + 'rn 3 = !>•••) MN, 

k=l 


(2.90) 


and 


A 




ELl Vii&k, &khsk + 7nA,j 

Ylk=i d" 7n 


i,j = 1,..., MN. 


(2.91) 


Observing (2.89), we conclude that standard subspace techniques can be applied to the covari¬ 
ation or the covariation coefficient matrices of the observation vector to extract the bearing infor¬ 
mation. More specifically, it follows that the rank of matrix V(0, m)T sV H (0, zu) is g, with the 
smallest (MN — q) of its eigenvalues equal to zero. In other words, if we let p\ > p 2 > * • * > Pmn 
denote the eigenvalues of matrix Tx, then 


Pq+l = * * * = PMN = 7n* 


(2.92) 


By denoting the corresponding eigenvectors of Tx by {uthe ROC-MUSIC spectrum can 
be expressed as 


Sroc-music^, &) — 


ESi |vtf(i?,OT) U| 


.12 


(2.93) 


The locations of the source signals are determined by the values of 6 for which the spectrum 
given by (2.93) peaks. 

In practice, we have to estimate the covariation matrix from a finite number of array sen¬ 
sor measurements. A proposed estimator for the covariation coefficient A Xi (t),xj(t) called the 
fractional lower order (FLOM) estimator and is given by [26] 



(*)»*> 00 


EjU (t) 

E?=il*iWl p 


(2.94) 


for some 0 < p < a. We will refer to the new algorithm resulting from the eigendecomposition of 
the array covariation coefficient matrix as the Ro bust Covariation-Based MUSIC or ROC- 


MUSIC. 


C. Simulations 

In this section, we study the resolution capability and estimation accuracy of ROC-MUSIC versus 
MUSIC as functions of the noise characteristic exponent a, and the separation of the two sources. 
The array is linear with five sensors spaced half wavelength apart (N = 5). The number of 
transmitted pulses is M = 10. The noise follows the bivariate isotropic stable distribution. 
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Table 2.5: GSNR and average PSNR for different values of a. 



Noise Characteristic Exponent a 

a = 1.01 

a = 1.2 

a = 1.4 

a = 1.6 

a = 1.8 

O 

CN 

i! 

$ 

GSNR [dB] 

22.3 ft 

' = 1) 

PSNR [dB] 

-17.9690 

-7.8245 

-2.8001 

-0.8380 

-0.2164 

-0.0441 


Characteristic exponent a. In this experiment, we evaluate the performance of the two 
algorithms in a wide range of noise environments, from the more impulsive (a in the neighborhood 
of 1) to the Gaussian ones (a = 2). The angles of arrival for the two signals are 9\ = —5° and 
0 2 = 5°. The GSNR is 22.3 dB (7 = 1) and is shown together with the average PSNR on 
Table 2.5. The characteristic exponent a of the additive noise is unknown to the ROC-MUSIC 
algorithm. We use two values of the parameter p in the estimation of the covariation matrix (cf. 
(2.94)): p — 0.8 and p = 0.4. Clearly, MUSIC can be thought as a special case of ROC-MUSIC 
with p — 2. 

In Figures 2.24-2.28, space-time spectral estimates obtained in five independent trials are 
shown for the ROC-MUSIC and MUSIC algorithms. Five types of alpha stable noise corre¬ 
sponding to three values of the characteristic exponent a = 1.1, 1.3, 1.5, 1.8, 2.0 were used. 
We can see that the MUSIC method exhibits high-resolution performance only for the case of 
Gaussian additive noise. On the other hand, the ROC-MUSIC method exhibits better resolution 
capabilities for non-Gaussian additive noise environments. 

Figure 2.29 depicts the improved performance of ROC-MUSIC over that of MUSIC both 
in terms of resolution probability and MSE, for values of a in the range (1,2). Note that for 
a < 1.2, MUSIC does not resolve the two sources in any of the 200 Monte Carlo runs. The 
results suggest that in impulsive noise environments modeled under the stable law, it is more 
beneficial to use the covariation matrix (lower-order moments) instead of the covariance matrix 
(second-order moments). Of course, for Gaussian additive noise (a = 2) the use of second-order 
moments ( p — 2) gives better results. 

Angular separation. The resolution analysis of the two algorithms was studied by using a 
popular resolution criterion defined by the following threshold equation: 

A(01,02) = P(0m) - \{P(6i) + P(0 2 )} > 0 , (2.95) 

where 9\ and 62 are the angles of arrival of the two signals, 6 m = (0i + O 2 )/2 is the mid-range 

between them, and the null spectrum P(0) = 1/S(9) is defined as the reciprocal of the spatial 
spectrum S(9) given in (2.93). The two signals are said to be resolvable if inequality (2.95) holds. 
The inequality implies that the null spectrum magnitude at the mid-angle should lie above the 
line segment linking the two signal valleys, in order for the two sources to be resolvable. In our 
experiments, we estimated the null spectrum from a finite number of array sensor measurements 
and we determined the probability of resolution by averaging the resolution events over the 200 
independent Monte Carlo runs. 

Figure 2.30 illustrates the variation of the algorithmic performance with respect to the spatial 
angle separation of the two incoming signals for GSNR= 22.3 dB, PSNR(ce = 1.5) — —1.56 dB, 
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and PSNR(a = 1.8) = -0.22 dB. As expected, the resolution capability of both algorithms 
improves with increased angle separation between the two sources. But for a given probability 
of resolution, the ROC-MUSIC algorithm requires a lower angle separation threshold than the 

MUSIC algorithm. 
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Appendix A: Proof of Proposition 2.3 


Define 


T x = 


To = 


T 3 = 


1 

K 


K 


k =1 

1 K 

r 7 £ 1 ^ I” 

As=l 

1 * 

1 / vA:\<p-l> 

tv' / , ^ m n / 

A fc=l 
K 


I< 


T, = *Ei* 


/cip 
7Z I 


fc= 1 


The vector 


Ti 

T 2 

t 3 

t 4 


is asymptotically (as if —> oo) normal with mean rrvj — 


-1 

r \i 

^ 12 

^ 13 

** 14 


and correlation matrix R r = 

^21 

^22 

P 23 

^ 24 

, where 


7*31 

^32 

T 33 

^34 



_ r 4i 

^42 

^43 

r 44 m 


rn 

— 


- 2 } 


r 12 = 

ri3 = 

r i4 = 
T 21 = 

T 22 = 

^ 23 = 

^24 = 

^ 31 = 

r 3 2 
r 33 


^ 34 

^ 41 
^42 




IrSiXiiX^-^XmiXn)^-^} 


K 

1 


,T{X l -(X j )<^ 1 >|X„| p } 


K 

* 


= r 


' 12 

^£{\Xj\ 3p } 

±£{X m (X n )<v- l> \Xj\*l 
±£{\X n \>\Xj\*} 

r 13 
* 


23 


-^{|X m | 2 |X„| 2p - 2 } 


^{X m (X n ) 


<p-l> 


l*n| P } 


= r 


14 

.* 

24 


£{\Xn\ P } 
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r 43 = 


Let 


and define G = 


34 


r 44 = -^{|*n| 2p }- 


9 i(ui, u 2 ,u 3 ,u 4 ) = C{p,a)ui /p u^ p 1 

fif 2 (wi, w 2 ,«3, w 4 ) = C'(p,a)«3 /p U4 /p_1 


dg 1 

®JtL 

dg 1 

dg 1 1 

dui 

du 2 

dus 


dg 2 

832 

dg 2 

dg 2 

. du\ 

dU2 

du3 

dv>4 . 


, where 


flgi 

du 2 

dg\ 

du 3 

dgi 

du 4 

dg2 

dui 

dg2 

du 2 

dg 2 

du 3 

dff2 

dud 


C(p, ot)—{u\u 2 ) a l p ~ l 
P 


c( P ,«)(^iK«r 2 


= 0 


= 0 


= 0 


= 0 


C{p,a)-(u 3 u 4 ) a/p 1 
P 


C(p,<*)(-- 1 )u£uG 


P 


and G is evaluated at 


= m T 


Ui 

U2 
U3 
U 4 

^i(T 1? T 2 ,r3,r 4 ) 

g 2 {Ti,T 2 ,T 3 ,T 4 ) \ 

and covariance matrix G{R r — IRtWLt)^- 


Then: 


-1 

’£>> 


[ c mn _ 



will be asymptotically normal with mean 


C 

C, 


13 


mn 


Appendix B: Proof of Proposition 2.4 

A 

From the results in Appendix A and (2.42), we immediately see that Rjj is asymptotically normal 
with mean Rjj and variance ' -1 ) 2 , where £ is the asymptotic error variance in 


the estimate Cij. 


2 —a 


Let now h{u\,U 2 ) = 2 uiu 2 a . Then: 

Oh 

du\ 


2 —a. 

= 2u 2 a 
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Therefore: 


dh 

du 2 


O — rv 2-2a 

2«i(- )u 2 a 

a 


VR, 


2 —a 


= [2C* 1 — ,2( 


O' . 2—2a _ 


CijCij 


On - a 

^33 


n- 2 /x 2 

Cij Cj j Cjj & 3 j 


2—a 

2C - Q 

33 


2( 2 ?)C«C' iJ " 


2—2a 


Appendix C: Derivation of CRB for Complex Isotropic Cauchy 
Noise 


The log-likelihood function is given by (2.61) 


JVM 


L(&) = -- !°g (t 2 + \ x i - 2 ). 


2=1 


The derivatives of L(0) with respect to x, y], are as follows 


dL 

dip 


o MN i o 

“2 £ fTW ■ ^ [t + ' M * [Xl ~ pvi)] 

' 1 


MN 


Q lvl 1 V -1 

= • t-n%,(«•* - /»‘/<oKwK] 

2 — 1 

7 2 + K'l 2 


where df = dai/dip. Likewise, we obtain: 


dL _ m ^X(»-)(4(o)* n »> 

eta — 7 2 +|^-| 2 




where eft = dbi/du . We also have that 


eta 


MTV 


<r\ 1V1 IV -| 

! E 7TKF ■ ■ 

2 — 1 




7 2 + KI 2 


(2.96) 


(2.97) 


and 


5L 


MW 


§ E t 2 + |„.|2 ' - 3“s(i) i ’/(i) n . r ! 


7 2 + l n «| 2 


7^ + n , 
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Now, we have to calculate the components of the Fisher information matrix. 



Imnmn s [/ r 6 ;, 


= 



l. *=1 3=1 


7 2 + \ni\ 2 




7^ + rc. 


(2.98) 


From the assumption made in Section 2.4.1, and observing that {|a;| = 1, i — 1 , 2 , • • •, (TV — 1)} 
and {|&i| = 1, i = 1, 2, • • •, (M — 1)}, we obtain: 

(W6J t0 (d S(0 )* n ,])^ 

■ (t 2 +KI 2 ) 2 



9 y e / (l^ll rf g(olK-l cos (-^ - #(o - ^(o 


n /2 + 4 > f )) 2 


2=1 


(t 2 + Kl 2 ) 2 


where: 


= arctan ^ 


3fi[n,*] 


(j >0 = arctan ( — 

1 x 


(2.99) 


and 


=arctan (HI) 


<j>i = arctan 


.Sfo] 


( 2 . 100 ) 


In addition, 



MAT 

i =1 


E 


m 


(7 2 + K| 2 ) 2 




7T 


Using the trigonometric propriety cos 2 ($) = 4(1 + 008(2-!?)), we obtain: 


/2 + <^")} 
( 2 . 101 ) 



9 MN r 

= V E [|4,I¥I’E 

q MN 

1/7 E K(<tf 

2=1 

w’Ewi*. 


2 (1 + cos(-2 <f>p - 2<f>f(i) ~ 20®( t *) - 7r + 20") 


57 2 

3M 
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2 — 1 


In the same way we can find: 



3 N 
67 2 


M 


i/7Eki 


612 


( 2 . 102 ) 
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Now, we can obtain 



hh 1 t 2 +ki 2 

l (7 2 + KI 2 ) 2 


7 2 + |n,- 2 
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\ + \ cos (- 2 ^(0 


+ 2 C) 


Likewise we obtain: 


MN O 1 O 

9 V-=- MN 

fr[ 15 t 2 2 5 t 2 


3L 


■MN. 


(2.103) 


Now: 
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Likewise we obtain: 


Finally, 
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dL_dL_ 

dx dy 
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f dLjdL_ 

1 du> dy 


M 
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nxy: \d% 
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cos (-^(o - ^}(i) + €) • sin (-^/« - <%) - €) 


The Fisher Information matrix will be: 


J(©) = 


by 


M\(3\ 2 || d a 

m 2 p 

yM8 a 
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where || d a || 2 = Eili \dai/d-ip\ 2 and || d b || 2 = Ei=i \dbi/du\ 2 , and: 


2 _ sr'M 


(2.106) 


and 
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M 

<^ = £ 141 , 
2 = 1 


MN 


P = £ l d l«>ll d /(.)l- 


2 = 1 


(2.107) 


(2.108) 


(2.109) 


Using the matrix inversion lemma [27] and after some simple matrix multiplication we have 


j — 1 
J 2x2 
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d' II 2 — M ^6 ) 
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( 2 . 110 ) 


where 


Z=(M || d“ 


M 

TV 


ii <•* 


( 2 . 111 ) 


From (2.110) the bounds on the variances of the spatial and temporal frequency estimates are 
obtained as 

CRB(i>) = ^ • 57V( H ^ E -% /M l (2.112) 


and 
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CRB(uj) — ?2 5M ^l d 
C Kn[cv) x ^ 


3£ 


a ||2 


S 2 JN) 
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(2.113) 
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Then, by the invariance principle for maximum likelihood, we can find the bounds tor <j) an d /• 
If we let w = W(@), we can express the Cramer-Rao bound for the estimator w by means of the 
Cramer-Rao bound for 0 [28]. In fact, we have the follow property for the two Fisher information 


matrices: 

J(w) = GJ(0)G t , 


(2.114) 


where G — { dOj/dwi }. In our case G is diagonal 


cos (j> 0 0 0 

0 2 nT r 0 0 

0 0 10 

0 0 0 1 


(2.115) 


Using (2.106), (2.114) and again the partitioned inverse matrix lemma we obtain the bounds in 
(2.64) and (2.65). I 
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5 


l-component of the clutter data on Sensor#1 at 15 degrees (scaled by 1e+5). 


3 


Q-component of the clutter data on Sensor#1 at 15 degrees (scaled by 1e+5). 



Figure 2.4: Measured I/Q-components of radar clutter: Azimuth: 15°, estimated mean: 0.0043 + 
j*0.0099, estimated a = 0.7318, y = 0.0413. 



Figure 2.5: Comparison of the empirical, SaS, and Gaussian amplitude probability distributions. 
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Q-component of the clutter data on Sensor#1 at 45 degrees (scaled by 1e+5). 
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Figure 2.6: Measured I/Q-components of radar clutter: Azimuth: 45°, estimated mean 
—0.0121 — j0.0247, estimated a = 0.8301, 7 = 0.0537. 
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Figure 2.7: Comparison of the empirical, SaS, and Gaussian amplitude probability distributions. 
















15 


l-component of the clutter data on Sensor#1 at 60 degrees (scaled by 1e+5). 


10 


Q-component of the clutter data on Sensorfl at 60 degrees (scaled by 1e+5). 



Figure 2.8: Measured I/Q-components of radar clutter: Azimuth: 60°, estimated mean: 0.0051 + 
j‘0.005, estimated a = 0.6886, 7 = 0.0394. 



Figure 2.9: Comparison of the empirical APD with Gaussian and SaS models. 
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l-component of the clutter data on Sensor#1 at 120 degrees (scaled by 1e+5). 


Q-component of the clutter data on Sensor#1 at 120 degrees (scaled by 1e+5). 
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Figure 2.12: Measured I/Q-components of radar clutter: Azimuth: 120°, estimated mean 
—0.0199 + j0.0215, estimated a = 1.1314, 7 = 0.0171. 
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Figure 2.13: Comparison of the empirical APD with Gaussian and SaS models. 
































l-component of the clutter data on Sensor#1 at 240 degrees (scaled by 1e+5). 


Q-component of the clutter data on Sensor#1 at 240 degrees (scaled by 1e+5). 


























l-component of the clutter data on $ensor#1 at 315 degrees (scaled by 1e+5). 


Q-component of the clutter data on Sensor#1 at 315 degrees (scaled by 1e+5). 
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Figure 2.16: Measured I/Q-components of radar clutter: Azimuth: 315°, estimated mean: 0.017+ 
j0.0079, estimated a — 1.0304, 7 = 0.0193. 
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Figure 2.17: Comparison of the empirical APD with Gaussian and SaS models. 






























alpha = 2, Gaussian estimate alpha = 2, FLOS-based estimate 
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alpha =1.5, Gaussian estimate alpha = 1.5, FLOS-based estimate 
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Figure 2.18: Illustration of the performance of estimators of the underlying matrix of a subGaus- 
sian vector. 



Probability of False Alarm Probability of False Alarm 

Figure 2.19: Comparison of the small sample performance of the Gaussian (dotted line) and the 
Cauchy (solid line) detector. 
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Figure 2.20: Performance of the Gaussian (left column) and the Cauchy (right column) detector 
as a function of the characteristic exponent a. 
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Figure 2.23: MSE of the estimated DOA and CRB as functions of the characteristic exponent a 
(a) Exact signal knowledge, (b) Least-squares estimate of the signal. 

































MUSIC (N=5, M=10, TH=[-20 -40 40], D=[-.3 -.2 .3], a=1.8) 



MUSIC (N=5, M=10, TH=[-20 -40 40], D=[-.3 -.2 .3], a=1.8) 



(a) 


(b) 


ROC-MUSIC (N=5, M=10, TH=[-20 -40 40], D=[-.3 -.2 .3], a=1.8) 





(d) 


Figure 2.27: MUSIC (a-b) and ROC-MUSIC (c-d) angle-Doppler spectra (N - 5, M - 10, 
© = [-20°, -40°, 40°] D = [-0.3, -0.2, 0.3]). Additive stable noise (a = 1.8, 7 = 4 )- 
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(a) 




Figure 2.29: Probability of resolution (a) and mean square error (b) as functions of the charac¬ 
teristic exponent a . 
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Part 3 


Scalable Portable Parallel 
Algorithms for STAP Applications 


Investigator: Professor Viktor K* Prasanna 

Collaborator: Young Won Lim 


To take advantage of High Performance Computing (HPC) platforms for radar signal process¬ 
ing, scalable parallel algorithms and portable implementations are essential. The algorithmic 
techniques must exploit the HPC architectures. These techniques will be significantly different 
from earlier fine grain custom VLSI approach [5, 6]. In this part, scalable portable algorithms 
on HPC platforms are developed for Higher-Order Post-Doppler (HOPD) STAP. This problem 
is computationally challenging because of the amount of computations to be performed and the 
need for real-time performance. 

We assume a general model of High Performance Computing platforms to develop our al¬ 
gorithms and perform their scalability analysis. Our algorithms are written in MPI (Message 
Passing Interface) and they can be easily ported to various HPC platforms. In HOPD, Doppler 
processing (FFT computations) is followed by solving least square problems (QR Decomposi¬ 
tions). These steps comprise most of the computation time in HOPD processing. Hoever, the 
data is accessed in orthogonal directions during these steps. Instead of finding a fixed single 
data mapping scheme, we use two different data mapping schemes so that all the necessary data 
can be contained within a single processor while performing computations during these steps. In 
other words, by data remapping between Doppler processing and QR decomposition, we derive 
scalable algorithms for HOPD STAP. 

The QR decomposition step causes the major computational bottleneck in HOPD STAP. If 
state-of-the-art parallel QR decomposition algorithms such as ScaLAPACK [7] are used, then 
frequent short message communications are inevitable because of sequential bottlenecks in such 
algorithms. Also, the communication time of the HOPD STAP implementation increases as 
the number of processors increases due to the collective communication patterns in the QR 
decomposition step. Hence, the efficiency of these algorithms becomes very low for small size 
matrices (less than 1000 x 1000) which are typically encountered in HOPD STAP. It should be 
noted that many QR decompositions are to be solved successively in HOPD STAP. Thus, the 


98 


cumulative parallelizing overheads of these algorithms are too excessive to be used for parallelizing 
HOPDSTAR 

Our scalability analysis shows that the communication time of the proposed algorithm de¬ 
creases as the number of processors increases. This is possible because as more processors are 
employed, smaller amount of data is allocated to each processor. As a result, the amount of 
data to be communicated by each processor during data remapping decreases as the number of 
processors increases. Also, data remapping eliminates frequent short message communications 
that occurs in parallel QR decomposition algorithms. Our experimental results show that ad¬ 
ditional reduction in the communication time of data remapping is possible by overlapping the 
communications with the computations and scheduling the communications. 

We also parallelize weight computation steps of Element Space Pre-Doppler (ESPrD) and 
Element Space Post-Doppler (ESPsD) STAP, which are the major computational bottlenecks in 
the entire processing. We develop efficient distribution algorithms when a small set of processors 
distribute their data equally to a large set of processors. This happens frequently in the gen¬ 
eral real-time signal processing applications, because the number of sensors is limited while the 
number of processing nodes required for real-time needs is much greater. Also, when exploiting 
task parallelism, computationally intensive tasks must be distributed among many processors to 
alleviate computational bottleneck. Simple communication scheduling schemes will incur node 
contention during distribution. Thus, we develop efficient distribution algorithms by using data 
remapping and processor groups, which will therefore reduce the effect of node contention signif¬ 
icantly. Although we have focused on two variations of STAP, the techniques of communication 
scheduling and overlapping computation with communication are adequately general. They can 
therefore be applied to several other signal processing computations to yield efficient implemen¬ 
tations. 


3.1 Overview 

The primary goal of a radar system is to detect/track targets over a surveillance volume. The 
system operates by transmitting signals, receiving echoed signals from reflecting objects, and 
processing those signals. Radar signal processing encompasses modulation theory, detection and 
estimation theory, and performance evaluation. With recent improvement in computing technol¬ 
ogy, attention has been focused on building next generation radar systems. Such systems will 
be required to provide longer range detection of increasingly smaller targets. They also must be 
able to operate in the presence of hostile electronic counter measures under unfavorable environ¬ 
ments. Thus, these systems require complicated signal processing techniques which demand more 
computational power than traditional techniques. STAP (Space-Time Adaptive Processing) is 
such technique. STAP refers to adaptive filtering algorithms that utilize both spatial and tempo¬ 
ral information Many variations exist in STAP algorithms. Each one has different performance 

characteristics and different computational complexity [1], 

Several special purpose architectures have been designed for radar systems. They rely heavily 
on custom VLSI fine grain computations. This approach provides limited flexibility and there¬ 
fore requires substantial redesign when new radar signal processing algorithms are developed. 
Recently, processing nodes interconnected by a high speed network become increasingly popular 
as HPC (High Performance Computing) platforms. Advances in networking and HPC technolo¬ 
gies lead to the development of next generation radar Systems. Such systems are promising 
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because of the following characteristics. They have a scalable architecture, i.e. the network 
interface and communication network are designed to be scalable and allow processing nodes to 
be easily attached. Their flexibility enables a wide range of signal processing applications such 
as radar, sonar, acoustic, and speech processing to be performed on the same platform by simply 
replacing application software. They can also effectively cope with new developments in signal 
processing theories. 

In such systems, coarse grain computation is performed by application software. The perfor¬ 
mance of such systems therefore greatly depends on the application software. It must be designed 
to be scalable otherwise the desired speed-up for the real-time response cannot be achieved due 
to overwhelming parallelizing overheads. Also, it must be designed to be portable so that it 
can be ported to various systems. Due to MPI standard [17], application software developed on 
general HPC platforms can be easily ported to new systems. In this report, issues that arise 
when realizing radar signal processing algorithms on general HPC platforms, will be addressed. 
It is necessary to develop algorithmic techniques to handle these issues, because current compil¬ 
ers cannot provide optimized performance. The goal of the this research is providing a general 
framework for developing high speed, scalable, portable radar signal processing algorithms on 
HPC platforms. Although our research will focus on radar signal processing, it will contribute to 
developing algorithms for general signal processing applications as well, since these exhibit com¬ 
mon computational characteristics. We believe our research is useful in designing next generation 
radar systems. 

The most important requirement in radar systems is real-time response. Despite advances in 
computer technology, real-time response is not an easily achievable task because of the need to 
process larger problems (i.e. surveillance volume has grown rapidly). Also, the signal processing 
area has evolved significantly in the last decade; new theories have been developed in statistical 
modeling, estimation processing area has evolved significantly in the last decade; new theories 
have been developed in statistical modeling, estimation methods, transformation theory, among 
others. These sophisticated signal processing algorithms require more computation power. The 
computational requirements of generic radar signal processing are in the range of GFLOPS. In 
some STAP applications, processing 400 Giga words of data necessitates 12 Tera FLOPS. We 
believe that scalable algorithms and portable implementations are essential to take advantage of 
HPC platforms as radar systems. Algorithmic techniques such as data remapping and overlapping 
communication with computation must exploit the HPC architecture, which are significantly 
different from earlier fine grain computation. 

Scalability, flexibility, programmability, and portability of HPC platforms motivate developing 
next generation radar systems for real-time applications such as radar systems. In the real-time 
applications, the guaranteed fast turn around time is the most important requirement. Both 
system software and application software should be designed to meet this requirement. 

System software must be designed so that operating systems and message passing libraries can 
ensure predictable computation and communication. Unlike traditional real-time systems which 
depend on custom VLSI fine grain computation, coarse grain computations must be performed by 
application software. Application software therefore plays an important role in these systems. It 
should be designed to be scalable, otherwise excessive parallelization overheads prevent achieving 

the desired speed-up for the real-time response. 

In this report, we focus on developing application software - scalable algorithms for radar 
signal processing on general HPC platforms. Emerging systems will retain minimal functionality 
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of general HPC platforms. Additionally, they will have advanced hardware and software features 
e.g. real-time OS, real-time MPI, and intelligent network interface to perform computations 
and communications more efficiently. Portability will also be maintained on those systems. Our 
developed algorithms ported onto those systems will therefore exhibit more improved performance 
by the advanced features of those systems. 

In general, a wide range of signal processing applications such as radar, sonar, acoustic, 
and speech processing consist of a sequence of tasks. Each task possesses different computational 
characteristics. Because large amount of data is to be successively processed in real-time, following 
issues must be addressed. 

• The computational requirement of each task is different from that of other’s. Therefore, 
particular task can be a computational bottleneck. Scheduling the computations to allocate 
the computational resources becomes important. 

• The data is accessed and processed in different ways in each task. It is difficult to find a 
data mapping which can be used through entire sequence of tasks. Even though such data 
mapping exists, changing data mapping between successive tasks can improve the overall 
performance. It is necessary to develop efficient data movement algorithms. 

• Latency as well as throughput is important. Utilizing various types of parallelism in coop¬ 
eration with data mapping is important. 

• Latency as well as throughput is important. Utilizing various types of parallelism in coop¬ 
eration with data mapping is important. Combining task and data parallelism is a possible 
solution for small latency and high throughput. MIMD coarse grain computation is there¬ 
fore more suitable than SIMD or SIMP. 

Current compiler technology cannot be used in real-time system because of following reasons. 
Finding optimal data mapping is known to be an NP-complete problem. Compilers can offer only 
limited way of data mapping e.g. block, cyclic, and block-cyclic. Loop parallelization involves 
large amount of unnecessary data movement between processors and increases communication 
time. Thus, in the this research, we will develop algorithmic techniques to handle previously 
mentioned issues. It will provide more optimized performance than what compilers can. We 
briefly list our approaches as follows. 

Data Remapping 

In the radar signal processing, the input data cube is accessed in many ways as the process¬ 
ing proceeds. Thus, the design of initial data mapping as well as data remapping between 
tasks is needed to minimize the communication time. The data mapping problem in gen¬ 
eral terms is to design a distribution of data among the processors such that the overall 
communication time is minimized. Data remapping rearranges the data mapping between 
the tasks of the application problem to reduce the communication time. 

Overlapping Communication with Computation 

Current network interfaces and switches offer extremely low hardware latencies; however, 
the software overheads can be in the range of tens of micro seconds. It is necessary to 
exploit latency hiding techniques in the algorithm design phase. We will exploit overlapping 
computation with communication to hide the overheads in performing the communication, 
in order to minimize the communication time in data remapping. 
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Mapping and Scheduling of Computation and Communication 
Most signal processing problems have regular and known computation and communication 
characteristics. Exploiting the regular and known computational characteristics of signal 
processing algorithms on HPC platforms is needed to obtain real time performance. Be¬ 
cause the computing power of the processors in HPC platforms are high, the mapping 
and scheduling should exploit coarse grain computations. To alleviate high communication 
overheads, communication should be carefully scheduled. 

To alleviate high communication overheads, communication should be carefully scheduled. 

In summary, we believe that new mapping and scheduling algorithms are needed to exploit 
HPC platforms for signal processing applications. These algorithms must take into account 
the granularity of the computations, large overheads in performing communication, and the 
capability to overlap computation with communication. The goal of the this research is developing 
a general framework to develop high speed, scalable, portable signal processing algorithms on 
HPC platforms to support real-time needs. 

In most HPC systems, lack of realistic computational models that capture the architectural 
features from a programmer’s perspective is a major bottleneck in designing and analyzing high 
performance systems. Lack of analytical models severely restricts the performance of algorithms, 
even though they are portable. Our parallel algorithms will be developed based on our analytic 
model and will consider the overheads in using HPC platforms. The features of the model will 
be exploited in algorithm design. We will use a well known definition of scalability. A parallel 
algorithm is considered scalable if the execution time of the algorithm on a machine with P 
processors varies as 1/P. Many HPC platforms are under development. Thus, architecture 
independent algorithms and portability of the developed code is very important. Specification of 
our algorithms using standards such as MPI will ensure portability of codes. 

3.1.1 Modeling High Performance Computing Platforms 

Most of the modern HPC platforms are coarse grained parallel processing systems, having a 
similar high level architecture. The architecture consists of three main components as shown 
in Figure 3.1: (a) Powerful computing nodes , often based on the same processors as those used 
in current-generation uniprocessor workstations, (b) A low latency high bandwidth network that 
interconnects the computing nodes. The network typically consists of high speed point-to-point 
links and routing switches, (c) Network interfaces that couple each processing node to the network 
links. 

Current commercial HPC platforms offer hundreds of Giga FLOPS of computing perfor¬ 
mance. Examples of such systems are the TMC CM-5, IBM SP-2, and Cray T3D, among others. 
These machines have been successfully used for a variety of high performance applications with 
good results. The chosen model of parallel computation should accurately depict the features 
of the machines, including the hardware features of computation and communication units, the 
method used for routing, the technique employed for handling congestion, and mechanisms for 
synchronization. In these machines, the overheads of performing communication are still high 
compared with the speed of processors. These machines are therefore suitable for coarse grain 
computations. 

For our analysis, we will model general HPC platforms as a set of high-performance SISD 
machines interacting through a high-speed network. Interprocessor communication is performed 
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Figure 3.1: Typical HPC platform 

using explicit message passing. Let T d denote the startup time for sending a message. Let r d 
denote the transmission rate (seconds per unit of data) for data communication. The startup 
time, including operating system and communication protocol processing overheads, is associated 
with each communication step. In current generation of HPC platforms, the network (hardware) 
latency is very small as compared with the software overheads in message passing. 

We make the following assumptions for our analysis: (1) Sending a message containing m units 
of data from a processor to another processor or exchanging a message of size m between a pair 
of processors takes T d + mT d time. (2) Suppose each processor has m units of data to be routed to 
a single destination and the set of all destinations is a permutation, then the data can be routed 
in Td + mr d time. (3) To perform a broadcast of a unit of data, a barrier synchronization, or a 
cooperative operation (sum, min, max, prefix sum etc.), 2(logp)T^ time is required. Broadcasting 
m units of data takes \ogp(T d mr^) time. (4) To perform a floating point operation, it takes t c 
time. 

3.1.2 Scalability 

A sequential algorithm is usually evaluated in terms of its execution time, expressed as a function 
of the size of its input. The execution time of a parallel algorithm depends not only on input 
size but also on the architecture of the parallel computers and the number of processors. A 
parallel system is the combination of an algorithm and the parallel architecture on which it is 
implemented. The scalability is a measure of its ability to achieve performance proportional to 
the number of processors. 

In the ideal case, the execution time of a scalable parallel system should linearly decrease 
with an increasing number of processors employed. However, in practice this is not the case, 
since there is a parallelizing overhead such as communication time and idle time. The purpose 
of scalability analysis is to predict and measure the performance of a parallel system. Several 















metrics have been proposed for measuring the scalability of parallel systems. 

The fixed problem analysis allows us to answer the question such as, “What is the fastest 
I can solve problem A on computer X?” and “What is the greatest number of processors I 
can utilize if I want to maintain an efficiency of 50 percent?”. Large parallel computers are 
frequently used not only to solve the fixed size problems faster, but also to solve larger problems. 
This encourages a different approach to the analysis of algorithms called scaled problem analysis, 
whereby we consider not how efficiency varies with the number of processors, but how the amount 
of computation performed must scale with the number of processors to keep efficiency constant. 

3.1.3 Portability and MPI 

The message passing paradigm is the most generally applicable and efficient programming model 
for parallel machines with distributed memory. It has been used widely in parallel and distributed 
computing systems for some years. Message passing is attractive for portability and performance 
since shared memory architectures can effectively execute message passing programs whereas the 
reverse is not generally the case. However, the development of parallel algorithms for applications 
has been hindered by the absence of a standard message passing interface. The portability of 
library routines is becoming increasingly important as codes are required to run on several vari¬ 
eties of parallel computers. Portable libraries must also allow for portability across programming 
languages. In order to obtain maximal reuse, the same libraries must be usable from several lan¬ 
guages. Standardization effort were begun in 1992, to define a message passing interface which 
would be efficiently implemented on a wide range of parallel and distributed computing systems. 
The main objectives of a message passing standard are portability and ease-of-use. Also the 
standard will provide vendors with a clearly defined set of routines that they could implement 
efficiently at a low level, or even provide opportunities for developing efficient hardware. It not 
only provides portability and ease-of-use, but also addresses to a limited extent the issues of 
program correctness and performance, because by providing high-level routines and abstractions, 
a standard can reduce the likelihood of programming errors, thereby enhancing program cor¬ 
rectness. MPI also allows to develop parallel application libraries. The computing platforms for 
MPI include homogeneous and heterogeneous parallel and distributed systems and even shared 
memory systems. 

The basic content of MPI is point-to-point communication between pairs of processes and 
collective communication within groups of processes. MPI also contains more advanced mes¬ 
sage passing features which allow the user to manipulate process groups, provide topological 
structure for process groups, and support the development and utilization of parallel libraries. 
Every message whether in point-to-point or collective communication has an associated data 
type. The primitive data types of the host language, for example INTEGER and REAL in Fortran, 
are supported. MPI also provides very general facilities which can be used to describe “derived” 
data types. The data types of MPI provide all the information required for data conversion in 
heterogeneous environments, and do not preclude efficient implementation of MPI in homoge¬ 
neous environments. They also allow the users to send and receive messages with complicated 
storage patterns without the need to copy data in to and out of message buffers, and allow an 
implementation to optimize communications with such storage patterns. 

Point-to-point and collective communications within MPI are performed within process groups. 
MPI defines a group as an ordered set of process identifiers, each of which is assigned a numerical 
rank within the group, between zero and the size of the group. Communications within MPI 
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are also performed within a communication context which insulates messages in different parts 
of the program from one another. The defining property of a context is that a message sent 
in one context can only be received in that same context. The communication context is the 
primary mechanism for isolation of messages in different libraries and the user program from 
one another. Process groups are user level objects in MPI but communication context are not 
directly visible. MTI bundles the process group and communication context concepts into a 
user level object called a communicator which provides communication services within a unique 
scope. MPI defines an initial communicator MPLCOMM-WORLD which has a group containing 
all processes of the program and a unique communication context. MPI provides routines which 
allow the user to dynamically create new communicators. MPLCOMMJDUP creates a duplicate 
of an existing communicator, i.e. a new communicator with the same group of processes and a 
different communication context. This routine is key to the construction of robust communicative 
parallel libraries. MPLCOMM.SPLIT creates one or more new communicators which contain 
distinct subgroups of an existing communicator and of course a different context. This routine is 
key to clear expression of task and control parallel programs. In the simplest use of MPI, which 
corresponds to a number of current communication libraries, MPLCOMM-WORLD is the only 

communicator used in the program. 

The point-to-point message-passing routines form the core of the MPI standard, the basic 
operations being send and receive. They allow messages to be sent between pairs of processes, 
with message selectivity based explicitly on message tag and source process, and implicitly on 
communication context. Each process can execute its own code, in MIMD style, and can be 
sequential or multi-threaded. There is no explicit support for threads, but care has been taken 
to make MPI “thread safe”, by avoiding the use of global state. The send receive primitives 
are provided in a blocking form in which the sender buffer can be reused immediately on return 
from send and the receiver buffer primitive MPLRECV. There are four blocking send primitive 

corresponding to the four communication modes in MPI. 

Standard The sender is blocked until the send buffer can be reused without altering the 
message. The receiver is blocked until the message has been copied into the receive buffer. 
Since the system is expected to copy the message subject to buffer resources, the send- 
reev pair does not guarantee synchronization. This mode seems to best represent common 
practice. The send primitive is MPI-SEND. 

Synchronous The sender is blocked until the receiver issues the corresponding receive. No 
system buffer is required and the message can be transferred without intermediate copies, 
at the expense of synchronization. The send primitive is MPLSSEND. 

Ready-receive The program is in error if the send is issued before the matching receive 
has been issued. This allows a simple protocol where the message is sent “ in hope” and 
dropped if there is no ready receive, but use demands special care. The send primitive is 

MPI-RSEND. 

Buffered This mode allows the user to control the space available for buffering within a 
defined buffer model, providing guaranteed portability for programs that demand message 
buffering. The send primitive blocks until the message is copied into the buffer space or is 
in error if insufficient buffer space was available. The send primitive is MPI-BSEND. 

MPI also provides primitives of the non-blocking, or immediate return, form, MPI_I?SEND and 
MPI J?RECV, in which the message buffer must not be used until the communication has corn- 
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pleted, similar to the immediate routines in NX/2. There is a small but comprehensive set of 
routines to test and wait for completion of non-blocking functions. This functionality allows the 
user to write programs which do not incur the overhead of copying message data into intermediate 
buffers. 

Collective communications are provided where all processes in a process group are involved in 
a collective operation. A collective function is called as if it contained a group synchronization, 
although this property is not mandated since efficient implementations may not synchronize. The 
following is description of some important collective communication routines. 

MPIJ3ARRIER Synchronization of all processes within a group. 

MPIJBCAST Every process within a group receives data broadcast by a root process. 

MPRGATHER Every process within a group sends data to a root which stores the data 
in rank order. 

MPI_SCATTER The inverse of MPI.GATHER, where a root process sends sections of 
data to every process within a group in rank order. 

MPIJREDUCE Performs a parallel reduction over every process within a group. The 
operation is selected from a set of defined arithmetic and logical operators or is described 
as a user function. The output is available to a root process, every process, or scattered 
over the processes. 

MPI also contains collective routines for all-to-all global communication, all-to-all personal com¬ 
munication also known as complete exchange, and inclusive parallel prefix also known as scan. 

3.2 HOPD STAP and its Computational Requirements 

The first step of HOPD STAP is Doppler processing which transforms a data cube into the 
Doppler domain; M -point FFT computations are performed along delay dimension for each ele¬ 
ment and each range-gate. Therefore, a total of NL M-point FFT computations are performed. 
The next step is weight computation. The transformed data is converted into least square prob¬ 
lems. There are (M-2) least square problems to be solved. The transformed data cube is divided 
into the M slabs along the range-gate dimension, which are referred to as Doppler bins as shown 
in Figure 3.2. The underlying matrix of each least square problem, is obtained by concatenating 
the three L X N size consecutive Doppler bins i — 1, i and i + 1 in an overlapped fashion. The 
resulting matrix has a size of L x 3 N. Each matrix is divided into two parts. The first part of the 
size Lis X 3 N defines the least square problem that is solved by QR decomposition. The solution 
of the least square problem i.e . the weight vector is applied to the remaining part ( L wa ) in the 
third step. L\ s will typically vary from 2(3iV) = 6 N to 5(3 N) = 15 N [11]. 

Figure 3.2(a) shows the typical value of each dimension of a data cube (delay dimension 
M—64, element dimension N=6 4, and range-gate dimension L=4096 (Li s = 1024) for a nominal 
size problem. All these parameters can change for a bigger size problem). In the Doppler 
processing stage, NL M-point FFT computations are performed. In the weight computation 
stage, there are (M-2) QR decomposition computations of size L\ s x 3 N. It is noted that only 
Li s =16N(=1024) parts of range-gates are used for the weight computation, as shown in Figure 
3.2(b). The weight application and detection stages have also many independent problems, but 
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the amount of computations is comparable to that of Doppler processing stage. The sequential 
m-point FFT problem requires mlogm complex multiplications and additions. Assuming one 
complex multiplication and addition operation take 5 flops*, the sequential m-point FFT takes 
5mlog mflops. The sequential QR decomposition of a m X n matrix needs n 2 (m - nj 3) flops 
[14]. Therefore, the estimated computational requirement of HOPD STAP is as follows. 

• Doppler Processing: NL M-point FFT Problems 

(NL) x 5 x MlogM = 480 x 2 20 flops = 480 Mflops 

• Weight Computation: (M-2) QR Decomposition Problems of size 16N x 3 N 

(M - 2) x {(3N) 2 x (161V - 31V/3)} = 2100 Mflops » 2 Gflops 

The entire HOPD STAP requires approximately 3Gflops. The real time requirements vary by 
the application. Suppose that the entire set of steps is to be computed in less than 1 second, then 
at least 54 POWER2 processors (operating at 66MHz) are needed for the real time computation 
of HPOD STAP in an ideal case. 


3*3 Scalable Parallel Algorithms for HOPD STAP 

In this section, we first discuss the characteristics of general HPC platforms for algorithm devel¬ 
opment and scalability analysis. Then, we develop techniques for data remapping, scheduling and 
overlapping communication with computation to obtain scalable parallel algorithms for HOPD 
STAP. 

3*3*1 Computational Model 

For our analysis, we will model the general HPC platforms as a set of high-performance SISD 
machines interacting through a high-speed network. Assuming MPI is used, interprocessor com¬ 
munication is performed using explicit message passing [8]. Let Td denote the startup time for 
sending a message. Let Td denote the transmission rate (seconds per unit of data) for data 
communication. The startup time, including operating system and communication protocol pro¬ 
cessing overheads, is associated with each communication step. In current generation of HPC 
platforms, the network (hardware) latency is very small as compared with the software overheads 
in message passing. 

We make the following assumptions for our analysis: (1) Sending a message containing m units 
of data from a processor to another processor or exchanging a message of size m between a pair 
of processors takes Td + mrd time. (2) Suppose each processor has m units of data to be routed to 
a single destination and the set of all destinations is a permutation, then the data can be routed 
in Td + mTd time. (3) To perform a broadcast of a unit of data, a barrier synchronization, or a 
cooperative operation (sum, min, max, prefix sum e£c.), 2(\og p)Td time is required. Broadcasting 
m units of data takes logp(T^ + mr^) time. (4) To perform a floating point operation, it takes t c 

time. 

For currently available HPC platforms, the ratio of Td to td is in the range of several hundreds 
to few thousands as shown Table 3.1^. To reduce the communication time, the high startup cost 
as well as the message length, must be considered. 

*C.B. Moler’s concept of a flop is used. One flop roughly constitutes the effort of doing a floating add, a 
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Table 3.1: Communication features of various HPC platforms. 


Machine 

Td{nsec) 

T d (fisec/byte) 

Td/r d 

T3D 

93 

0.043 

2163 

SP-2 

46 

0.035 

1314 

CM-5 

86 

0.12 

716 

Paragon 

82 

0.26 

315 

iPSC/860 



Mm 

iPSC/2 


0.36 

1944 


3.3.2 Scalability and Portability 

We use a standard definition of scalability and efficiency [13, 15]. A parallel algorithm is con¬ 
sidered scalable if the execution time of the algorithm on a machine with p processors varies as 
I. Efficiency is defined as pfr, where T s is the sequential execution time and T p is the parallel 
execution time. Our goal is to design scalable algorithms which provide high-speed execution on 
available partition sizes of machines for problem sizes useful to radar signal processing commu¬ 
nity. In our analysis, we consider machine sizes that are not “large”, for example, at this time, 
the largest installed accessible coarse grain machines have less than IK nodes. 

Our implementations were performed using the MPI standard, and therefore can be easily 
ported to other High Performance Computing (HPC) platforms. Our choice of MPI allows us 
to access many communication features provided by emerging HPC platforms. At this time, we 
have implemented our algorithms on SP-2 and T3D. 

3.3.3 Data Remapping, Scheduling, and Overlapping Computation with Com¬ 
munication 

For a scalable performance, efficient data mapping and data distribution schemes should be 
developed so that the parallelizing overhead is minimal; these schemes should minimize the inter¬ 
processor communication time. Our algorithm exploits these schemes and is scalable in the range 
of p< M. 

A. Data Remapping 

We choose data mapping schemes which allow a processor to contain all the necessary data for 
computing each task. In our approach, tasks such as FFT and QR decomposition do not need 
interprocessor communication. In the Doppler processing stage, the data cube is accessed along 
the delay dimension. However, in the weight computation stage which follows Doppler processing 
stage, the data cube is accessed along the range-gate dimension which is orthogonal to the delay 
dimension. Thus, different data mapping schemes are necessary for these computation steps. 
The remapping process refers to the transformation of the initial data mapping into another data 

floating point multiply, and a little subscripting [14]. 

^Some results were obtained by our own experiments. Others are from Technical Report No. UCB/CSD92 
^675, University of California, Berkeley. It is assumed that MPI is used for T3D. 
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mapping. Our scalability analysis in the following section shows that our approach using data 
remapping exhibits scalable performance characteristics, which allows each processor to compute 
sequentially as much as possible without interprocessor communication by providing the data 
required in it's computation. 

Initially, the data cube is partitioned along the range-gate dimension into blocks containing 
Li s /p snapshots and the processors are assigned to these partitioned data in a cyclic distribution 
as shown in Figure 3.3(a). We call this data mapping as snapshot mapping. Each processor has 
UojMK s i ze data. M- point FFTs can be performed in each processor without interprocessor 

communication. The other mapping is shown in Figure 3.3(b), which we name as slab mapping. 
Here, the L h part of slabs (Doppler bins) is partitioned along the delay dimension among the 
p processors. Note that only L[ s part of the data cube is used for weight computation and the 
rest ( L wa part) is used for weight application. The L wa part of the data cube is not remapped. 
The size of the data which will be remapped in each processor, is Lls ™ N ■ Each processor sends 

LlMK s i ze data to all the processors and receives size data from all the other processors 

as shown Figure 3.3(c). 

Each processor can form {M/p- 2) matrices of size L\ s X 37V for computing the weight vectors. 
Therefore, a total of {M/p- 2) QR decomposition problems can be solved without communication 
and two more problems can be solved by communicating with adjacent processors. Exchange of 
boundary data between the adjacent processors is needed. This is shown in Figure 3.4. This data 
exchange is performed during the remapping process. 

B. Scheduling 

As mentioned before, HOPD STAP consists of a sequence of tasks. The input to a task is output 
of the previous task. Due to the data dependency and different data access patterns between 
tasks, interprocessor communication is required. For scalable performance, a good scheduling 
scheme is necessary. The QR decomposition tasks are dependent on only Lu part of the data 
cube which were Doppler processed. The output of QR decomposition tasks are used along with 

the rest of the Doppler processed data for detection. 

Also, the scheduling of communication is important to fully exploit the available communi¬ 
cation bandwidth and to avoid node contention. During the remapping process, every processor 
distributes equal size data to all the other processors. It is a collective communication but it can 
be converted into {p- 1) steps of all-to-all personalized communications {i.e. permutations) by a 
simple scheduling. In each of the {p- 1) communication steps, each processor sends equal size data 
to another processor and receives equal size data from another processor. This exploits the fact 
that total p point-to-point communication can be performed in parallel among p processors. 

By employing a butterfly-like communication pattern, the number of communication steps 
can be reduced to logp and the startup overhead can be reduced by a factor of By using 

features such as communication group and context in MPI, the communication between processors 
can be localized. However, in this case, there is an additional overhead in buffer copying. 

C. Overlapping Communication with Computation 

We reduce the communication time of data remapping by scheduling the communications and by 
overlapping the communication with the computation. Using asynchronous non-blocking com¬ 
munication primitives provided by the MPI standard (such as MPI_Isend(), MPI_Irecv(), and 
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Figure 3.4: Boundary data exchange. 

MPI.WaitallO), we can overlap the communication needed for the remapping process with the 
FFT computations. First, each processor performs sequential FFT computations over L\ a portion 
of the data cube. As soon as these M -point FFT computations are computed, the remapping 

process can start. Asynchronous non-blocking commands MPI-IsendO and MPI_Irecv() return 
immediately after initiating communication without waiting for completion. Therefore, the rest 
of the FFT computations over L wa part of the data cube can be performed by processors, while 
the actual data transfer is handled by communication coprocessors. 

A processor can solve one QR decomposition problem, after receiving three consecutive slabs 
from the remapping process. Suppose the data cube is partitioned along the delay dimension by 
K such that > 3. The remapping process is divided into K (p-1) permutation steps. Then, 
the computation of QR decomposition can be also overlapped with the communication. 

D. Scalability analysis 

For scalability analysis, we assume that communication is not overlapped with computation. The 
experimental results of overlapping the remapping process with the computation are shown in 
Section 3.6. The remapping process consists of (p- 1) permutations. During each permutation, 
every processor sends and receives size data. Therefore, the communication time for the 

remapping process is 

V , LlsMN ^ /O 1\ 

Tremap = (p - 1){Trf + fc-^ 1 T d} (3-1) 

where k is the size of a data point in bytes. Boundary data exchanges involve two additional 
communications with message size of k ■ L h N • T d . The communication time for boundary data 
exchange becomes 2 {Td -f- k • Li s N • t^}. Also, for the weight application, the solutions of QR 
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decompositions, i.e. the weight vectors must be permuted among all the processors. This involves 
(. p - 1) all-to-all communications with message size of k • —. It takes ( p - 1 ){Td + k • — • 

r d } communication time. Because Li s N < ki*MK and y < ku MN w hen p < M, the total 
communication time of our approach is upper bounded by 

rp / rr_ , 3L[ S MN _ /n 

Tcomm ^ 2_p * Td “f ’ k m ' Td (3.2) 

P 

It is noted that 2p ■ T d <C k ■ ZkuMK . Td) f or the values taken by p (< M), Li s (= L/ 4), M, and 
N in typical STAP applications. For example, M= 64, N=6 4, L/ s =1024, and p=32, the startup 
overhead 2 p • Td= 2.9 msec which is much smaller than k • 3L h MN . Td= HQ msec for the SP-2. 
Therefore, T comm can be approximated as 


T 

± comm 





SLMN 




In summary, the total startup overheads are 2 p-Td and total amount of data to be communicated 
is approximately k • 3L ^ N • Td when data remapping is used. If data remapping is not used, then 
6(M- 2)N communication steps are required to solve (M- 2) L\ s x 3A^ size QR decompositions. 
(Solving one m x n size QR decomposition requires 2 n communication steps.) In this case, the 
startup overhead will be 6 (M — 2)N • Td- Thus, remapping which eliminates frequent short 
message communications, reduces startup overheads. Also, it reduces total amount of message 
to be communicated. This will be analyzed in detail in the Section 3.5. 

The sequential execution time T s consists of time for executing LN M -point FFTs, (M-2) 
L\s X 37V size QR decompositions, weight application, and detection. Weight application and 
the rest of the computations including detection are not computationally demanding. Thus, T s 
becomes 


T s = {9(M - 2)N 2 (L/4 + N) + lOLMNlogM} ■ t c (3.4) 


Due to the remapping of data, the load on the processors is well balanced. In other words, 
the FFT and QR decomposition computations are equally distributed among the processors. 
Therefore, the parallel execution time is 



T 

z± + T 

i comm 

p 

{9 (M - 2)N 2 (L/4 + N) + 10LMN log M} ■ — + k • -- - -- . n 

p 4 p 



rj! rp 

It should be noted that T comm <C In other words, the computation time (^) of each processor 
is much larger than the communication time (T comm ). Thus, the proposed algorithm is scalable 
in the sense that the execution time of the algorithm on a machine with p processors varies as ^ 
in the range of p < M. Efficiency becomes 





1 

1 I Xq omn 

T s /p 


Because 



p ’ 


near ideal efficiency is achieved. 
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3.4 Comparison with previous approaches 

A previous approach for HOPD processing uses row wrap mapping scheme in order to avoid data 
redistribution (remapping) [11]. In their approach, FFT computations can be performed sequen¬ 
tially without inter-processor communication. However, parallel QR decomposition is needed to 
exploit their row wrap mapping scheme. The parallel QR decomposition algorithm developed by 
Whitman et al. [12] was used in [11]. In [11], QR decomposition comprises most of the computa¬ 
tion as well as communication time. Their experimental results show that its communication time 
increases rapidly as the number of processors is increased. When a 32 node SP-1 is employed, 
the percentage of communication time is more than 50% of the total execution time. 

The parallel QR decomposition algorithm of [12] and its timing diagram for the first two steps 
are shown in Figure 3.5 and Figure 3.6. Processors are idle between PASS and SCALE steps and 
the processors communicate during PASS and BRDCAST steps. The communication pattern 
is collective for PASS and is broadcast for BRDCAST. PEO (which contains the main diagonal 
element an), can start SCALE step only after it has received the accumulated dot products from „ 
the other PE’s. During the SCALE step, all the processors except PEO remain idle. Also, after 
receiving the broadcast scale factors from PEO, all the processors can perform UPDATE. The 
estimated computation and the communication times are 

tcomp = {(n 2 + 2n) • ^ + n 2 + 5n} • t c (3.7) 

71? 3 Tl 

tcomm = {p+\ogp-l){n-T d + k-{—+—)-T d } (3.8) 

where rrt and n (m > n) represent the row and column size of the matrix. t c is the time for 
performing one floating point operation, and Td and Td are startup time and per-word transfer 
time, respectively. From Equation (3.8), the communication time increases in proportion top, the 
number of processors. There is another parallelizing overhead: there exists sequential bottleneck 
where only one processor is active and all the others are idling. The idle time of SCALE step is 
estimated in Equation (3.9) and during this time, (p-l)Udie useful work could be done otherwise. 

Udie — -(ft 2 + 7n)-£ c (3-9) 

To solve one QR decomposition problem, there are n communication steps and each step i 
involves collect and broadcast communications using message of size (n-i+ 1), where n represents 
the column size of the matrix involved. Therefore, a total of Ya=i 2(n —i+1) = n(n + 1) messages 
are communicated. Because there are (M- 2) QR decomposition problems of size Li s X 3N in the 
HOPD STAP, the total size of the messages to be communicated is 3N(3N + 1 )(M — 2). This 
is almost \-th of the entire data cube size we are considering. By assuming that all these data 
can be sent in two steps (i.e. collect and broadcast communications) by ignoring communication 
startup overheads and idle times, a lower bound on the communication time of the algorithm in 
[3] can be obtained. Using the time for point-to-point communication time as (Td + m • Td) and 
the time for broadcasting as (Td + m • Td) logp, the communication time ^comm in [11] is lower 
bounded as 

Tcomm > (p + logp - l){Td + k ■ ^N(SN+ 1)(M - 2) • Td) (3.10) 

« (p + log p) (fc^MTV 2 ) -T d (3.11) 
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The i-th step of the parallel QR decomposition based on House¬ 
holder transformation (z = n ,...,1 where n is the column size of 
the underlying matrix).: 



n i = 


1 


(«i+Yi) -yi 


A <— l / — KjUjUj I • A 

L (• 


(1) DOTPRD[z']: Compute the dot products of the local block of 

column z with columns Only use elements that are in 

matrix rows z through m. 

(2) ACCUMJ7]: Wait for the n-i+1 dot products from a successor 
processor, if there is one, and add on the local dot products. 

(3) PASS[z]: Send the revised n-i+1 dot products to a preceding 
processor, if there is one. 

(4) SCALED']: The processor which contains is the last to 

receive the revised dot products. It performs a square root opera¬ 
tion to determine the scale factor g - v adds it to the main diagonal 

element, corrects the inner products for the addition of g t - to the z- 

th element of u h and multiplies them by p v 

(5) BRDCAST[z]: Wait for the n-i+l scale factors from a preced¬ 
ing processor, if there is one, and pass them on to a successor 
processor, if there is one. 

(6) UPDATED*]: Update the local block of column z through n by 
subtracting multiples of the local block of column z. 


Figure 3.5: The parallel QR decomposition algorithm. 
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Figure 3.6: The timing diagram of the parallel QR decomposition algorithm for the first two 
steps. Four processors are assumed for the sake of illustration. 
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In other words, actual communication time of [11] will take at least (p + log p)(k^MN 2 ) • r^. 
Consider the case of M=64, A^=64, L/ s =1024, fc=8, and p= 32, then the communication time 
in the our algorithm is T comm « 384iiu;or<i • r<j, and the communication time in [11] becomes 
Tcomm > 40000 Kword • This explains why the efficiency of the algorithm in [11] drops 
drastically as the number of processors is increased. It should be also noted that there is significant 
processor idle time as shown in Equation (3.9). In contrast, the communication time of the 
proposed algorithm is inversely proportional to p as shown in Equation (3.3). If more processors 
are used in the range p < M, the communication time becomes less. Figure 3.7 shows the 
estimated communication time of the proposed algorithm and the communication time in [11] 
for the SP-2. As shown in Figure 3.7, T comm <C T f comm . It is also observed that T comm decreases 
while T f comm increases with increasing number of processors. 

ScaLAPACK [7] also provides QR decomposition routine. It uses BL AS (Basic Linear Algebra 
Subroutine), PBLAS (Parallel extension of BLAS), and BLACS(Basic Linear Algebra Commu¬ 
nication Subprogram). Because ScaLAPACK employs block cyclic data mapping, data redis¬ 
tribution is necessary between FFT computations and QR decompositions. The efficiency of 
ScaLAPACK drops significantly unless large size matrices are used. For example, it can deliver 
about sustained 21 GFLOPS on 16 X 32 node Paragon for the matrix size of 35000 X 35000. 
However, if the matrix size becomes less than 1000 X 1000, then less than 1 GFLOPS can be 
achieved on Paragon. This means that the parallelization overheads, z.e., communication and 
idle times dominate the computation time. Because the typical matrix size for HOPD STAP is 
167V x 5 N (1024 x 320 when N— 64), we cannot expect high efficiency by using ScaLAPACK. 


3*5 Experimental Results 

We have performed our implementations using C and MPI* on the SP-2 at Maui High Perfor¬ 
mance Computing Center and on the T3D at the Pittsburgh Supercomputing Center. Optimized 
versions of code provided by ESSL (Engineering Scientific Subroutine Library) were used in our 
SP-2 implementations. 

Remapping process can be performed either by using asynchronous non-blocking communi¬ 
cation or by using synchronous blocking communication as shown in Figure 3.8. Asynchronous 
non-blocking communication allows the communication in the remapping process to be overlapped 
with L wa parts of FFT computations. In this case, the actual time spent in communication ac- 
tivity becomes M0+M0+W*)) where t is (i), t ir (i), and t wt (i) are time spent in the i-th 
MPI.Isend, MPI.Irecv, and MPI_Waitall, respectively. In the case of non-blocking communica- 

<4 

tion, communication time becomes ^«r(0> where t sr (i ) represents time that takes in the i-th 

MPI_Sendrecv. Table 3.2 shows the time results of FFT and data remapping process. It should 
be noted that message size to be communicated decreases as the number of processors increases. 
Thus, the communication time of remapping process decreases as the number of processors in¬ 
creases up to M. Figure 3.9 illustrates the reduction in the remapping time by overlapping the 
remapping process with the computation. On both machines, the overlapped remapping process 

takes less than 37% of the non-overlapped one. 

SP-2 was configured in the dedicated mode to obtain the results of the HOPD STAP imple¬ 
mentation shown in Table 3.3. For the sake of comparison, an earlier implementation [11] of the 
same technique using the same parameters runs in about 30 seconds on a 32 node SP-1. 

! MPICH version was used. 
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Figure 3.7: Estimated T comm and T' omm on SP-2 for M=64, iV=64, L h = 1024. Each data is 
assumed as single precision complex number (8 bytes). 
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(a) Asynchronous Nonblocking communication 
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Figure 3.8: Asynchronous Nonblocking communication v.s. Synchronous Blocking 
tion for Remapping. 
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Table 3.2: Timing Results for computing FFT and Data Remapping on T3D and SP2. 
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a. Cray Library was not available. Numerical Recipies routine was used. 
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b. IBM ESSL library was used for computing FFT. MPICH was used. Loadleveller data. 


Table 3.3: Total Execution Times on SP-2 for M—N—64, L—4096. 


No. of PE’s 

p = l 

XJ 

II 

p = 8 

p = 16 

p = 32 

II 

C5 

SP-2 

55sec 

15 sec 

7.8 sec 

3.9 sec 

2.0 sec 

0.96 sec 


120 































































400.0 


300.0 

8 

C/3 

I 200.0 

<u 

6 

H 

100.0 


0.0 

Number of processors 


400.0 


300.0 


s 

•g 200.0 
6 
P 

100.0 


4 8 16 32 64 

Number of processors 

Figure 3.9: Experimental results of data remapping with overlapping for M—64, N= 64, L/*=1024 
on SP-2 and T3D. Each data is assumed as a single precision complex number (8 bytes). 




121 




3.6 A Methodology for the Design of Scalable Solutions 

Radar signal processing is typical of the applications that arise in embedded real-time environ¬ 
ments. In this section, we highlight a coarse-grained parallel approach to Space Time Adaptive 
Processing (STAP), a class of radar signal processing. We first describe a model of the target 
HPC platforms. We then perform scalability analysis on STAP algorithms, and demonstrate the 
design of efficient parallel algorithms for these tasks. 

Our approach to design efficient solutions for real-time applications on the HPC platforms 
consists of the following steps: 

• Definition of a realistic computational model of the HPC systems. The model represents 
the salient features of the underlying hardware architecture. 

• Analysis of the application task to be performed, and identifying the phases that are the 
most time consuming. 

• Quantification of the overheads incurred in executing the task on the architecture, with 
the assistance of the model. The overheads and the computation time are calculated as a 
function of the number of processors. 

These functions are then used to study the scalability of the application-architecture pair. If 
the total execution time (overheads + computation time) is inversely proportional to the number 
of processors, then the solution is said to be scalable. 

For our analysis of STAP tasks, we define a General purpose Distributed Memory (GDM) 
model. This model balances simplicity, accuracy and generality. We model the parallel system as 
a set of processors, interacting through a high speed network. Most HPC systems have a uniform 
high-level architecture of this form. 

The communication operation in such a system can be modeled with two parameters. One is 
startup time which occurs in every message transfer. This includes software and communication 
protocol processing overheads. The other is unit transmission time which is the cost of trans¬ 
ferring a message of unit length. Let Td denote the startup time and Td denote the transmission 
time. Then, a communication operation for sending a message containing m units of data from 
a module to another module takes Td + mi~d time. This is the case when “blocking” send is used 
and no special hardware for communication is available. During Td + mrd time, the sending pro¬ 
cessor is busy handling the communication. However, if there is a communication processor and 
nonblocking send is used, the sending processor has to spend only the startup time to activate the 
communication processor. After this initial startup time; the processor is free and can continue 
its computations, while actual transfer of messages is handled by the communication processor. 

The GDM model accurately represents the communication features of many HPC architec¬ 
tures. The model also represents features for overlapping computation with communication and 
latency hiding(for example, this feature represents the DMA access capability of the LANai net¬ 
work interface chip in Myricom systems [16]). In this case, a communication operation is assumed 
to take only Td time. 

Since many different platforms are under development, architecture independent algorithms 
and portability of the developed code are very important. We specify our algorithms using the 
MPI standard [17] to ensure portability of our code. 

In general, the data layout problem is to design a distribution of data among the processors 
such that the overall communication time is minimized, and the workload is balanced. In typical 
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signal processing applications, the data is accessed and processed in different ways in each sub¬ 
problem. The goal of data remapping is to rearrange the data between the steps of the algorithm 
so as to reduce the communication time. In some scenarios, the overhead incurred in performing 
the remapping operation might be larger than the benefit due to remapping. To estimate the 
tradeoffs, we use the model to evaluate the benefits and overheads due to data remapping. 

Most signal processing problems have regular and known computation and communication 
characteristics. This permits mapping and scheduling of computations to be performed at compile 
time. Indeed, this feature has been exploited in systolic arrays to transform the computations 
into a form that can be executed with localized communication. The scheduling problem for HPC 
platforms is different from that of custom VLSI systems on account of the larger granularity of 
the computations, large overheads in performing communication, and the capability to overlap 
computation with communication. 

3*7 Computational Characteristics of Element-Space STAP 

STAP consists of an antenna array (with N elements) that transmits pulses and then receives 
the reflected pulses. Each element accumulates data in this manner at L different ranges. Thus 
the total data collected in one PRI (Pulse Repetition Interval) has NL data samples. After M 
PRI’s, the accumulated data constitute a Coherent Processing Interval (CPI) data cube. Figure 
3.10 (a) shows the data arrival sequence, while Figure 3.10 (b) shows the data cube. 

A significant portion of STAP involves matrix algebra computations. STAP consists of three 
major stages - training strategy, weight computation, and weight application. Generally, weight 
computation requires the solution of a linear system of equations. This step is a computation 
intensive portion of the space time processor. Due to enormous computational requirements 
of fully adaptive STAP, many partially adaptive processing techniques are developed. In this 
report, we are considering on ESPrD (Element-Space Pre-Doppler) and ESPsD (Element-Space 
Post-Doppler) processing techniques, which are both partially adaptive STAP approaches. 

We parallelize weight computation steps of ESPrD and ESPsD, which are the major computa¬ 
tional bottlenecks in the entire processing. In the ESPrD, M f = M - K + 1(K =2^3) adaptive 
processing are performed over subsets of size N X K of a snapshot, as shown in Figure 3.11. The 
data cube is accessed along element dimension during adaptive processing. These subsets are 
formed in an overlapped fashion along PRI dimension. Each adaptation involves KN X KN di¬ 
mension linear equations which are to be solved by QR decompositions. After weight vectors are 
applied, Doppler processing (one M'-point FFT) is performed for a given snapshot. The weight 
computation steps of ESPsD are as follows: first, Doppler processing is performed separately on 
each element. This requires N M -point FFT’s along PRI dimension. Then, a different adaptive 
problem is solved for each Doppler bin, utilizing all elements. M QR decompositions of size 
N X N are performed and data cube is accessed along element dimension, as shown in Figure 
3.12. Figure 3.13 shows the computational requirements of ESPrD and ESPsD. 

For the range of 16 < M,N < 128, the estimated flop counts are shown in Figure 3.14,3.15, 
3.16,3.17. It can be easily observed that the flop count grows exponentially as N grows. This 
results from the fact that performing QR decomposition dominates other operations. Thus, the 
computational complexities of ESPrD and ESPsD are both 0(LMN 3 ). 
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Figure 3.10: The input data cube and sample arrival sequence 
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(LM' QR decompositions of size KNxKN) 
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• Doppler Processing: 

(L M'-point FFT’s) 
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Figure 3.13: Computational Complexities for ESPrD and ESPsD 
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Figure 3.14: Computational Requirements of ESPrD when L = 512 
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ESPsD (L=512) 



Figure 3.15: Computational Requirements of ESPsD when L = 512 


ESPrD (L=4096) 



Figure 3.16: Computational Requirements of ESPrD when L ~ 4096 
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ESPsD (L=4096) 



Figure 3.17: Computational Requirements of ESPsD when L — 4096 

3.8 Scalable Portable Algorithms for ESPrD and ESPsD 

In this section, we illustrate the design of scalable parallel algorithms for ESPrD and ESPsD. Let 
x nm i be the sample signal from the n-th element, m-th pulse, and at the l -th range-gate. The 
N antenna elements independently transmit/receive signals and compose part of the data cube. 
In implementing the STAP tasks on a parallel HPC platform, we assume that N I/O processors 
(c 0 ,.. ,,cjv-i) collect the data from each element and distribute them to P computing proces¬ 
sors (p 0 ,.. .,pp_i). The distribution of the data onto the processors affects the communication 
requirements, the degree of concurrency, and the load on the processors. Our goal is to design 
efficient algorithms which can provide high-speed execution on available machine sizes and real¬ 
istic problem sizes. For scalable performance, efficient data layout and data distribution schemes 
should be developed so that the parallelizing overhead is minimal; these schemes should minimize 
the inter-processor communication time. We parallelize weight computation steps of ESPrD and 
ESPsD at the task level. This means that individual tasks (FFT and QR decomposition) are 
performed sequentially on a single processor. However, parallelism is achieved by distributing 
whole tasks among the processing elements. The reason for employing task level parallelism is 
that the problem sizes of the tasks in ESPrD are too small to parallelize efficiently. Our previ¬ 
ous experience [9] shows that the cumulative communication overheads incurred in parallel FFT 
and QR decomposition algorithms (such as ScaLAPACK routines [7]), prevents scalable perfor¬ 
mance. We choose data mapping schemes, so that each node pi has all the data necessary for its 

computation. 

In this data mapping scheme, the communication occurs when N I/O processing nodes 
distribute their data to P computing nodes. This distribution constitutes an all-to-all communi¬ 
cation pattern. It should be noted that the number of computing nodes ( P ) is much larger than 
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the number of I/O processing nodes (AT), i.e. P N. This is because a large number of com¬ 
puting nodes are necessary to deal with the enormous computational requirements for real-time 
needs. We develop efficient distribution algorithms when a small set of processors distribute their 
data equally to a large set of processors. This happens frequently in the general real-time signal 
processing applications, because the number of sensors is limited while the number of process¬ 
ing nodes required for real-time needs is much greater. Also, when exploiting task parallelism, 
compute intensive tasks must be distributed among many processors to alleviate computational 
bottlenecks. 

A simple scheduling scheme consists of P steps of communication with messages of size ML/P 
as shown Figure 3.18. Using the GDM model, this communication needs T commi time, where 

T C omm i = P • Td + ML • Td (3.12) 

This is because at most N communications can be performed in parallel due to node contention, 
while there are NP communications to be performed. Thus, the cumulative start up overheads 
become large as P grows. Figure 3.19 shows communication scheduling which reduces cumulative 
start up overheads. It partitions P computing nodes into K = jt groups. Each group has N 
computing nodes. Thus, each I/O processing node can distribute ^ size data to first group, then 
second group, and so on. This distribution takes Td + As soon as all processing nodes 

in a group receive data from I/O processing nodes, the remapping operation can be started. 
The timing diagram shown in Figure 3.20(a) illustrates this pipelined communication scheduling. 
Because the remapping within a group takes (N - 1 ){Td + total communication time 

becomes 

T cm = K{T d + ^r d } + (N-l){T d +^T d } (3.13) 

(~ + N - l)T d + ML(1 + (3.14) 

In ESPsD, there are many FFT computations to be performed after the first distribution. There¬ 
fore, we can overlap the remapping process with FFT computations as shown in Figure 3.20(b). 
Also, separate communicators for distribution and remapping processes can be used, which MPI 
provides to avoid unnecessary synchronization during message passing. Figure 3.21 and Figure 
3.22 show the estimated communication time on SP2 and T3D for various values of L, M, and N. 


3.9 Conclusion 

In this work, we have developed scalable portable algorithms for partially adaptive STAP. The 
main contributions are in data remapping, scheduling and overlapping computation with com¬ 
munication. Our techniques result in uniform distribution of load on the processors as well as 
in minimization of the communication time. Earlier implementations are not scalable and the 
communication overheads dominate the overall execution time for a large number of processors. 
The implementations were performed using MPI and can be easily ported to other machines, in¬ 
cluding emerging HPC platforms for embedded systems applications such as the Martin Marietta 
Lab’s High Performance Scalable Computing System [3]. 

We have also parallelized weight computation steps of ESPrD (Element Space Pre-Doppler) 
and ESPsD (Element Space Post-Doppler) STAP. We have developed efficient distribution algo¬ 
rithms when a small set of processors distribute their data equally to a large set of processors. We 
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P computing nodes ( P=KN ) 

































































GroupO 



(a) Timing diagram 



(b) Overlapping communications with computations 

Figure 3.20: Timing diagram and overlapped communicaiton with computation 
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Communication Times on SP-2 



Figure 3.21: Estimated communication time on SP2 (o: Pipelined schedule, x: Simple Schedule) 
for N,M,L = (64,64,2048), (32,32,1024), (16,16,512). 
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Communication Times on T3D 



Figure 3.22: Estimated communication time on T3D (o: Pipelined schedule, x: Simple Schedule) 
for N,M,L = (64,64,2048), (32,32,1024), (16,16,512). 
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have developed efficient distribution algorithms by using data remapping and processor groups, 
which will therefore reduce the effect of node contention significantly. Although we have focused 
on two variations of STAP, the techniques of communication scheduling and overlapping com¬ 
putation with communication are adequately general. They can therefore be applied to several 
other signal processing computations to yield efficient implementations. 

Our algorithms can also be easily ported to shared memory systems. The Prefetch command 
can be used for latency hiding, thus, the computation and communication can be overlapped. It 
is easy to perform a similar analysis as shown in this section. For example, on shared memory 
machine models, one can easily perform the analysis using the Block Distributed Memory model 

[9]. 

Emerging HPC platforms provide many features for communication that can be accessed via 
standards such as MPI. Algorithmic techniques can exploit these to lead to scalable portable algo¬ 
rithms for Linear Algebra problems that offer superior performance compared with ScaLAPACK 
for the size of problems encountered by the signal processing and ATR communities. 


* 
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